Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

When should I use a dispatch table?

by Limbic~Region (Chancellor)
on Nov 30, 2006 at 23:00 UTC ( #587072=perlmeditation: print w/ replies, xml ) Need Help??

All,
A typical dispatch table is a technique used to translate:
if ($item eq 'thing1') { # ... } elsif ($item eq 'thing2') { # ... } elsif ($item eq 'thing3') { # ... } elsif ($item eq 'thing4') { # ... } else { # ... }
Into: exists $dispatch{$item} ? $dispatch{$item}->() : $dispatch{default}->();

In addition to opportunities for code reduction and clean maintainable code, dispatch tables can be a runtime optimization. The if/eslif chain is a linear scan O(N) while a hash is purported to be O(1). The hash does have the overhead of looking up the key, dereferencing the value, and executing the code ref. I assumed that in what only take a few equality checks to pay for this overhead.

In this node, I claimed that my version of the code was much more efficient. That node inspired madbombX to convert an if/eslif chain into a dispatch table and benchmark it. Unfortunately, the bench was flawed and the results skewed. This lead me to write code to generate a fair benchmark based off user input.

#!/usr/bin/perl use strict; use warnings; use Getopt::Std; use HTML::Template; use List::Util; my %opt; get_args(\%opt); my $template = HTML::Template->new(filename => 'bench.tmpl'); $template->param(CHAIN_LENGTH => $opt{n}); $template->param(UNMATCHED_ITEMS => $opt{u}); my @chain = map {ITEM => $_}, 2 .. $opt{n}; $template->param(CHAIN => \@chain); my $unmatched_items = int(($opt{u} / 100) * $opt{t}); $unmatched_items ||= 1 if $opt{u}; my @input = (-1) x $unmatched_items; my $total = $opt{t} - $unmatched_items; my @weight = $opt{w} ? sort {$b <=> $a} grep $_, split /[%,]|\s+/, $opt{w} : (int(100/$opt{n})) x $opt{n}; for (1 .. $opt{n}) { my $amount = shift @weight; push @input, ($_) x (int(($amount/100) * $total) || 1); } $_ = {FIND => $_} for @input; @input = List::Util::shuffle(@input) if ! $opt{o}; $template->param(BUILD_FIND => \@input); print $template->output(); sub get_args { my $opt = shift @_; my $Usage = qq{Usage: $0 [options] -h : This help message -n : The (n)umber of items in the if/elsif chain Default: 5 -t : The (t)total number of items to process This number need not be the same as your actual data. It should be sufficient to exercise the options Default: -n * 10 -u : The percentage of (u)nmatched items to include Typical if/elsif chains end with a catch-all else This option allows you to determine how often this is use +d Default: 0 -o : Specifies if the if/elsif chain should be (o)rdered If the input data is well known, it is possible to order the if/elsif chains to ensure the most common items are f +irst Default: 0 (off) -w : The (w)eight in percentages of the matched input Ignored unless -o option also specified There must be as many #% as there are -n to work correctl +y The #% must add up to 100 to work correctly The -b option, if present, takes precedence before this o +ption For instance, if you have n = 5, t = 100, -o, and u = 10 -w"40%, 30%, 15%, 10%, 5%" would equate to 100% of the re +maining 90% Default: N/A } . "\n"; getopts('hn:t:u:w:o', $opt) or die $Usage; die $Usage if $opt->{h}; delete $opt->{w} if ! exists $opt->{o}; $opt->{n} = 5 if ! defined $opt->{n}; $opt->{t} = $opt->{n} * 10 if ! defined $opt->{t}; $opt->{u} = 0 if ! defined $opt->{u}; }
bench.tmpl
#!/usr/bin/perl use strict; use warnings; use Benchmark 'cmpthese'; my %dispatch = map {$_ => sub {my $res = "I found it"}} 1 .. <TMPL_VAR + NAME="CHAIN_LENGTH">; <TMPL_IF NAME="UNMATCHED_ITEMS"> $dispatch{default} = sub {my $res = "I did not find it"}; </TMPL_IF> my @find; <TMPL_LOOP NAME=BUILD_FIND> push @find, <TMPL_VAR NAME=FIND>; </TMPL_LOOP> cmpthese(-10, { dispatch => sub { for (@find) { <TMPL_IF NAME="UNMATCHED_ITEMS"> if (exists $dispatch{$_}) { $dispatch{$_}->(); } else { $dispatch{default}->(); } <TMPL_ELSE> $dispatch{$_}->(); </TMPL_IF> } }, if_else => sub { for (@find) { if ($_ eq '1') { my $res = "I found it"; } <TMPL_LOOP NAME=CHAIN> elsif($_ eq '<TMPL_VAR NAME=ITEM>') { my $res = "I found it"; } </TMPL_LOOP> <TMPL_IF NAME="UNMATCHED_ITEMS"> else { my $res = "I did not find it"; } </TMPL_IF> } }, });

The results were quite amazing. No matter how I tweaked the settings, with only 6 items in the if/elsif chain I could not make the dispatch table win. It is still possible that the other optimizations I made to the code in that node made it more efficient but without a representative sample of the real input - I can't be sure.

The breakpoint without unmatched items is about 12 on my system. There are plenty of user defineable variables so perhaps you might want to question code you have that uses dispatch tables play around. While the code reduction and maintainability is still an advantage, the performance boost may not be all it is cracked up to be.

Update: Minor change to wording in last paragraph after realizing it may imply I was recommending not using dispatch tables because they were not as efficient as I assumed them to be originally.

See Also:

Cheers - L~R

Comment on When should I use a dispatch table?
Select or Download Code
Re: When should I use a dispatch table?
by traveler (Parson) on Nov 30, 2006 at 23:37 UTC
    I was pretty surprised by this, too. I have comments on your test, and then a theory. First the comments:

    The values in @find are integers, but in the if_else comparison tree, you test for string equality.

    Testing the hash for existance of a key first, seems wasteful. It might be faster if you did an eval and only acted on failure in that case. Of course, I haven't benched that, yet.

    My theory: maybe, just maybe, perl does the thing C used to (and maybe still does) and and builds if-else chains into a hash table for evaluation. Dunno.

    --traveler

      traveler,
      The values are integers only because it made enumerating easier. The string equality is intentional. The 1 .. n could have as easily been 'a' .. 'z'.

      The testing for the hash key existance is necessary sometimes (default condition). The template didn't discriminate and always added it. Thanks to your keen eye, it has been fixed. It did not change the results that much with or without default conditions.

      Cheers - L~R

        The values are integers only because it made enumerating easier. The string equality is intentional. The 1 .. n could have as easily been 'a' .. 'z'.

        OK. But doesn't that make the ifelse code slower because it has to stringify and/or do string comparison, while the hash can do an integer comparison?

        I did try another test. Since the values were integers I replaced the dispatch table hash with an array and performance improved, but not enough to catch the ifelse. I am still quite confused.

        runrig++, very clever. I had to stare at it for a second to realize what was going on.

        runrig,
        The issue that traveler pointed out was that I tested if a key existed when I didn't need to. Only when the user defines the -u option is it necessary to test for the hash key's existance. Without the -u, there is no default condition and the exists can be avoided. I corrected the template accordingly.

        You have mentioned an alternative way of determining which sub to dispatch. Unfortunately it may actually be less efficient than exists. You are fetching the value and testing it for truth where exists need only check to see if the key is present in the hash.

        Cheers - L~R

Re: When should I use a dispatch table?
by GrandFather (Cardinal) on Nov 30, 2006 at 23:55 UTC

    I must admit that my eyes popped somewhat at "I have re-written your code to be much more efficient." in that node. However I was too busy to bother benchmarking it and the dispatch table is much cleaner than the original code in any case so I let it go.

    I'm glad to see that you've followed up with some actual testing and the result doesn't much surprise me. Another thing to consider is that if at the time of coding you know the likely hit rate for each of the tests, the chained if/elsif can be much more efficient by testing the most likely hits first. That is still nasty code to write, test and maintain compared with the dispatch table, but stooping to "assembly language" techniques has to be a win sometimes. :)


    DWIM is Perl's answer to Gödel
      GrandFather,
      I am not sure if you looked at the options of this benchmark generating code but I included the ability for testing likely hit rates.

      Cheers - L~R

        I admit to my shame that I didn't do more than glance at your code and missed that (let's blame pressure of work). However I really intended to suggest that an implementor of such code ought consider the effect of such an optimisation, rather than that you would be remiss in not accounting for it in your benchmark. Kudos for allowing for hit rate optimisation though!


        DWIM is Perl's answer to Gödel
Re: When should I use a dispatch table?
by chromatic (Archbishop) on Dec 01, 2006 at 00:02 UTC
    While the code reduction and maintainability is still an advantage, the performance boost may not be all it is cracked up to be.

    I summon chromatic's second rule of performance optimization: You're doing IO! That's almost always the slow part!

      chromatic,
      In private /msg with madbombX I made this very point. When benchmarking it is important to eliminate as many factors unrelated to what you are testing as possible. In my benchmark generating code however, no IO comes into play (that I can see anyway). This is strictly determining the difference between a series of equality tests (linear search) and a hash lookup, reference dereference, and nearly empty sub execution.

      Cheers - L~R

        In mumble years of programming, I can't think of a single example where the overhead of a dispatch table versus cascading conditionals made a whit of difference compared to the cost of performing IO. Therefore, unless I ever write a FFT or a petabyte-Ethernet driver, I don't see much point to worrying about a handful of milliseconds of difference between two constructs where one is so much more maintainable than the other.

        Then again, almost every program I've ever written performs some sort of IO somewhere.

        As it's the holidays, can I wish that Benchmark refused to give meaningful results for syntax questions and only helped profile various algorithms?

        chromatic already has the IO part handled but your comparison is also getting dwarfed by making those function calls. Remove those too, they're one of the more expensive things you can do in perl (except for IO and just deliberately sleeping).

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: When should I use a dispatch table?
by BrowserUk (Pope) on Dec 01, 2006 at 01:07 UTC
    ... while a hash is purported to be O(1).

    Don't blame the hash lookup performance here.

    The thing that slows the dispatch table is simply that subroutines in Perl are slow relative to inline blocks.

    If you make it so that the ifelse chain also has to do hash lookups, and the dispatch table doesn't call the subroutines, it (unsurprisingly) makes the dispatch table fastest even for very low numbers of choices:

    C:\test>587072 -N=10 -KEYS=a,b dispatch => # line 1 "dispatch.pl" my $x += ( $dispatch{ $_ } ||= $dispatch{ default } ) for @da +ta; ifelse => # line 1 "ifelse.pl" for( @data ) { if( $_ eq "a" ) { my $x = $dispatch{ a } } elsif( $_ eq "b" ) { my $x = $dispatch{ b } } else{ my $x = $dispatch{ default } } } Rate ifelse dispatch ifelse 39306/s -- -29% dispatch 55351/s 41% --

    Equally, if we have both call the subroutines, the dispatch table wins:

    C:\test>587072 -N=10 -KEYS=a,b dispatch => # line 1 "dispatch.pl" my $x += ( $dispatch{ $_ } ||= $dispatch{ default } )->() for + @data; ifelse => # line 1 "ifelse.pl" for( @data ) { if( $_ eq "a" ) { my $x = $dispatch{ a }->() } elsif( $_ eq "b" ) { my $x = $dispatch{ b }->() } else{ my $x = $dispatch{ default }->() } } Rate ifelse dispatch ifelse 21966/s -- -15% dispatch 25880/s 18% -- C:\test>587072 -N=10 -KEYS=a,b,c dispatch => # line 1 "dispatch.pl" my $x += ( $dispatch{ $_ } ||= $dispatch{ default } )->() for + @data; ifelse => # line 1 "ifelse.pl" for( @data ) { if( $_ eq "a" ) { my $x = $dispatch{ a }->() } elsif( $_ eq "b" ) { my $x = $dispatch{ b }->() } elsif( $_ eq "c" ) { my $x = $dispatch{ c }->() } else{ my $x = $dispatch{ default }->() } } Rate ifelse dispatch ifelse 16250/s -- -22% dispatch 20798/s 28% --

    But if you leave the lookups in both, but only have the dispatch table call the subs, the ifelse chain wins upto similar numbers to those you report:

    C:\test>587072 -N=10 -KEYS=a,b,c,d,e,f,g dispatch => # line 1 "dispatch.pl" my $x += ( $dispatch{ $_ } ||= $dispatch{ default } )->() for + @data; ifelse => # line 1 "ifelse.pl" for( @data ) { if( $_ eq "a" ) { my $x = $dispatch{ a } } elsif( $_ eq "b" ) { my $x = $dispatch{ b } } elsif( $_ eq "c" ) { my $x = $dispatch{ c } } elsif( $_ eq "d" ) { my $x = $dispatch{ d } } elsif( $_ eq "e" ) { my $x = $dispatch{ e } } elsif( $_ eq "f" ) { my $x = $dispatch{ f } } elsif( $_ eq "g" ) { my $x = $dispatch{ g } } else{ my $x = $dispatch{ default } } } Rate dispatch ifelse dispatch 10511/s -- -8% ifelse 11428/s 9% -- C:\test>587072 -N=10 -KEYS=a,b,c,d,e,f,g,h dispatch => # line 1 "dispatch.pl" my $x += ( $dispatch{ $_ } ||= $dispatch{ default } )->() for + @data; ifelse => # line 1 "ifelse.pl" for( @data ) { if( $_ eq "a" ) { my $x = $dispatch{ a } } elsif( $_ eq "b" ) { my $x = $dispatch{ b } } elsif( $_ eq "c" ) { my $x = $dispatch{ c } } elsif( $_ eq "d" ) { my $x = $dispatch{ d } } elsif( $_ eq "e" ) { my $x = $dispatch{ e } } elsif( $_ eq "f" ) { my $x = $dispatch{ f } } elsif( $_ eq "g" ) { my $x = $dispatch{ g } } elsif( $_ eq "h" ) { my $x = $dispatch{ h } } else{ my $x = $dispatch{ default } } } Rate dispatch ifelse dispatch 9558/s -- -1% ifelse 9635/s 1% -- C:\test>587072 -N=10 -KEYS=a,b,c,d,e,f,g,h,i dispatch => # line 1 "dispatch.pl" my $x += ( $dispatch{ $_ } ||= $dispatch{ default } )->() for + @data; ifelse => # line 1 "ifelse.pl" for( @data ) { if( $_ eq "a" ) { my $x = $dispatch{ a } } elsif( $_ eq "b" ) { my $x = $dispatch{ b } } elsif( $_ eq "c" ) { my $x = $dispatch{ c } } elsif( $_ eq "d" ) { my $x = $dispatch{ d } } elsif( $_ eq "e" ) { my $x = $dispatch{ e } } elsif( $_ eq "f" ) { my $x = $dispatch{ f } } elsif( $_ eq "g" ) { my $x = $dispatch{ g } } elsif( $_ eq "h" ) { my $x = $dispatch{ h } } elsif( $_ eq "i" ) { my $x = $dispatch{ i } } else{ my $x = $dispatch{ default } } } Rate ifelse dispatch ifelse 8092/s -- -6% dispatch 8566/s 6% --

    I thought I read somewhere that subroutine call performance had been improved in bleedperl, but I cannot find the reference?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      BrowserUk,
      Don't blame the hash lookup performance here.

      I didn't. To quote myself from the very next sentence: The hash does have the overhead of looking up the key, dereferencing the value, and executing the code ref. I knew that subroutine overhead is expensive - I even wrote a node inquiring into optimizing them a few years ago: (Inline subs?).

      I am glad you replied and allocated the majority of the performance hit to the correct culprit. Unfortunately, typical dispatch tables use code refs. I hear lots of good things about bleed perl so hopefully 5.10 will be released soon.

      Cheers - L~R

        I was trying to point out that "the overhead of looking up the key, dereferencing the value," is marginal relative to the overhead of "executing the code ref".

        Even that last phrase slightly conflates

        • the process of invoking the sub;
        • executing the code it contains.

        I realise that you know the difference, but your wording--combined with your use of "purported"--tended to imply that "looking up the key" was a significant factor. Which it isn't. I sought to demonstrate this.

        Unfortunately, typical dispatch tables use code refs.

        It wouldn't be a dispatch table if it didn't :). I agree, you cannot avoid the overhead of calling the subroutine.

        I had a couple of cracks at coding a 'macro processor'--really just a sub inliner a couple of years ago.one of them may even be posted here somewhere?--using source filters, but it was never very satisfactory.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: When should I use a dispatch table?
by Anonymous Monk on Dec 01, 2006 at 15:17 UTC
    I will not bother to find out where the break-even point on my system lies. As chromatic says, the milliseconds saved are unimportant. If they would be important, I would neither use a Perl if-else chain, nor a Perl dispatch table. Instead, I'd write a piece of C code (which could be an if/else chain, a switch or a dispatch table).

    Luckely for me, most of the time, the milliseconds are not important, and I can use both Perl, and whatever technique I find more readable/maintainable.

Re: When should I use a dispatch table?
by jdporter (Canon) on Dec 01, 2006 at 18:07 UTC

    I think performance optimization should not be the principle determining factor in deciding which technique to use. Unless your application is running unacceptably slowly and you've identified the multi-branch if/else section as a bottleneck, it's ridiculous to change that to something else just for a performance gain.

    Maintainability, and design — as in consistency of — those are of primary importance.

    I like dispatch tables. They're clever. But at the same time, they raise a red flag for me — the same one I see whenever hashes are used to store a static set of entities in an application: It completely undermines the safety of strict. (There are steps you can take to regain some of this safety, such as using Tie::StrictHash.)

    The solution is to make the subs named class methods. I recently did this on a project at work.

    I had two dispatch tables, one for the "real" functions and one for a set of stubs, to be used when debugging.

    Old way:

    my %real_functions = ( wipe_system => sub { system "rm -rf /" }, ); my %debug_stubs = ( wipe_system => sub { warn "wiping system (no, not really)\n" }, ); my $funcs = $debug ? \%debug_stubs : \%real_functions; $funcs->{'wipe_system'}->();

    New way:

    { package SystemFunctions::Real; sub wipe_system { shift; # will be the class name system "rm -rf"; } } { package SystemFunctions::Debug; sub wipe_system { shift; # will be the class name warn "wiping system (no, not really)\n"; } } my $funcs = $debug ? 'SystemFunctions::Real' : 'SystemFunctions::Debug'; $funcs->wipe_system();

    Now, it sometimes won't be convenient to do this, due to the complication of packages. In my case, it was; in fact, it was a significant improvement, since it gave me a convenient place to encapsulate all my "system functions", which hitherto had lived in the main namespace.

    In the very simplest cases, you don't need to worry about packages at all:

    my %dispatch = ( incr => sub { $x++ }, decr => sub { $x-- }, ); $dispatch{$op}->();
    becomes
    sub incr { shift; $x++ } sub decr { shift; $x-- } main->$op();

    Note: I only left the shift in as a reminder that it will always be necessary unless the sub takes no arguments (or, as I sometimes do, you pop the args rather than shifting them).

    Finally, what about default cases? One possibility is to use AUTOLOAD.

    We're building the house of the future together.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://587072]
Approved by imp
Front-paged by liverpole
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (13)
As of 2014-07-28 19:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (207 votes), past polls