Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

elsif chain vs. dispatch

by sflitman (Hermit)
on Apr 26, 2009 at 21:32 UTC ( #760195=perlquestion: print w/ replies, xml ) Need Help??
sflitman has asked for the wisdom of the Perl Monks concerning the following question:

Hello, my hearties! Support the Pirate Party! Aarhhh, matey! P-) <-- (smiley wearing an eye patch)

I ran some benchmarks with the following code to see if elsif chains were faster than a dispatch table. I pray for your wisdom as to why a chain of elsifs which should be O(N) is faster than a hash operation which ought to be O(1). Is there some Perl Magick going on here?

#!/usr/bin/perl # elsifbench - which is faster, a chain of elsif or a dispatch table? # SSF 112208 042609 use strict; use Benchmark qw/:all/; sub random { int rand shift } my @letters=('A'..'Z','a'..'z','0'..'9'); my $nLetters=scalar @letters; sub elsifchain { my $x=$letters[random($nLetters)]; my $y='xyzzy!' x random(1000); if ($x eq 'A') { split(/!/,$y); } elsif ($x eq 'B') { split(/!/,$y); } elsif ($x eq 'C') { split(/!/,$y); } elsif ($x eq 'D') { split(/!/,$y); } elsif ($x eq 'E') { split(/!/,$y); } elsif ($x eq 'F') { split(/!/,$y); } elsif ($x eq 'G') { split(/!/,$y); } elsif ($x eq 'H') { split(/!/,$y); } elsif ($x eq 'I') { split(/!/,$y); } elsif ($x eq 'J') { split(/!/,$y); } elsif ($x eq 'K') { split(/!/,$y); } elsif ($x eq 'L') { split(/!/,$y); } elsif ($x eq 'M') { split(/!/,$y); } elsif ($x eq 'N') { split(/!/,$y); } elsif ($x eq 'O') { split(/!/,$y); } elsif ($x eq 'P') { split(/!/,$y); } elsif ($x eq 'Q') { split(/!/,$y); } elsif ($x eq 'R') { split(/!/,$y); } elsif ($x eq 'S') { split(/!/,$y); } elsif ($x eq 'T') { split(/!/,$y); } elsif ($x eq 'U') { split(/!/,$y); } elsif ($x eq 'V') { split(/!/,$y); } elsif ($x eq 'W') { split(/!/,$y); } elsif ($x eq 'X') { split(/!/,$y); } elsif ($x eq 'Y') { split(/!/,$y); } elsif ($x eq 'Z') { split(/!/,$y); } elsif ($x eq 'a') { split(/!/,$y); } elsif ($x eq 'b') { split(/!/,$y); } elsif ($x eq 'c') { split(/!/,$y); } elsif ($x eq 'd') { split(/!/,$y); } elsif ($x eq 'e') { split(/!/,$y); } elsif ($x eq 'f') { split(/!/,$y); } elsif ($x eq 'g') { split(/!/,$y); } elsif ($x eq 'h') { split(/!/,$y); } elsif ($x eq 'i') { split(/!/,$y); } elsif ($x eq 'j') { split(/!/,$y); } elsif ($x eq 'k') { split(/!/,$y); } elsif ($x eq 'l') { split(/!/,$y); } elsif ($x eq 'm') { split(/!/,$y); } elsif ($x eq 'n') { split(/!/,$y); } elsif ($x eq 'o') { split(/!/,$y); } elsif ($x eq 'p') { split(/!/,$y); } elsif ($x eq 'q') { split(/!/,$y); } elsif ($x eq 'r') { split(/!/,$y); } elsif ($x eq 's') { split(/!/,$y); } elsif ($x eq 't') { split(/!/,$y); } elsif ($x eq 'u') { split(/!/,$y); } elsif ($x eq 'v') { split(/!/,$y); } elsif ($x eq 'w') { split(/!/,$y); } elsif ($x eq 'x') { split(/!/,$y); } elsif ($x eq 'y') { split(/!/,$y); } elsif ($x eq 'z') { split(/!/,$y); } elsif ($x eq '0') { split(/!/,$y); } elsif ($x eq '1') { split(/!/,$y); } elsif ($x eq '2') { split(/!/,$y); } elsif ($x eq '3') { split(/!/,$y); } elsif ($x eq '4') { split(/!/,$y); } elsif ($x eq '5') { split(/!/,$y); } elsif ($x eq '6') { split(/!/,$y); } elsif ($x eq '7') { split(/!/,$y); } elsif ($x eq '8') { split(/!/,$y); } elsif ($x eq '9') { split(/!/,$y); } else { warn "Huh?"; } } sub dispatch { my $x=$letters[random($nLetters)]; my $y='xyzzy!' x random(1000); my %dispatch=( A=>sub { split(/!/,$y); }, B=>sub { split(/!/,$y); }, C=>sub { split(/!/,$y); }, D=>sub { split(/!/,$y); }, E=>sub { split(/!/,$y); }, F=>sub { split(/!/,$y); }, G=>sub { split(/!/,$y); }, H=>sub { split(/!/,$y); }, I=>sub { split(/!/,$y); }, J=>sub { split(/!/,$y); }, K=>sub { split(/!/,$y); }, L=>sub { split(/!/,$y); }, M=>sub { split(/!/,$y); }, N=>sub { split(/!/,$y); }, O=>sub { split(/!/,$y); }, P=>sub { split(/!/,$y); }, Q=>sub { split(/!/,$y); }, R=>sub { split(/!/,$y); }, S=>sub { split(/!/,$y); }, T=>sub { split(/!/,$y); }, U=>sub { split(/!/,$y); }, V=>sub { split(/!/,$y); }, W=>sub { split(/!/,$y); }, X=>sub { split(/!/,$y); }, Y=>sub { split(/!/,$y); }, Z=>sub { split(/!/,$y); }, a=>sub { split(/!/,$y); }, b=>sub { split(/!/,$y); }, c=>sub { split(/!/,$y); }, d=>sub { split(/!/,$y); }, e=>sub { split(/!/,$y); }, f=>sub { split(/!/,$y); }, g=>sub { split(/!/,$y); }, h=>sub { split(/!/,$y); }, i=>sub { split(/!/,$y); }, j=>sub { split(/!/,$y); }, k=>sub { split(/!/,$y); }, l=>sub { split(/!/,$y); }, m=>sub { split(/!/,$y); }, n=>sub { split(/!/,$y); }, o=>sub { split(/!/,$y); }, p=>sub { split(/!/,$y); }, q=>sub { split(/!/,$y); }, r=>sub { split(/!/,$y); }, s=>sub { split(/!/,$y); }, t=>sub { split(/!/,$y); }, u=>sub { split(/!/,$y); }, v=>sub { split(/!/,$y); }, w=>sub { split(/!/,$y); }, x=>sub { split(/!/,$y); }, y=>sub { split(/!/,$y); }, z=>sub { split(/!/,$y); }, 0=>sub { split(/!/,$y); }, 1=>sub { split(/!/,$y); }, 2=>sub { split(/!/,$y); }, 3=>sub { split(/!/,$y); }, 4=>sub { split(/!/,$y); }, 5=>sub { split(/!/,$y); }, 6=>sub { split(/!/,$y); }, 7=>sub { split(/!/,$y); }, 8=>sub { split(/!/,$y); }, 9=>sub { split(/!/,$y); } ); if (exists $dispatch{$x}) { $dispatch{$x}->(); } else { warn "Huh?"; } } cmpthese(100000,{elsif=>\&elsifchain,dispatch=>\&dispatch}); exit; __END__ A..Z = 26 items Benchmark: timing 100000 iterations of dispatch, elsif... dispatch: 56 wallclock secs (55.97 usr + 0.03 sys = 56.00 CPU) @ 17 +85.71/s (n=100000) elsif: 35 wallclock secs (35.02 usr + 0.00 sys = 35.02 CPU) @ 28 +55.51/s (n=100000) Rate dispatch elsif dispatch 1786/s -- -37% elsif 2856/s 60% -- A..Z, a..z, 0..9 = 62 items Benchmark: timing 100000 iterations of dispatch, elsif... dispatch: 93 wallclock secs (91.94 usr + 0.01 sys = 91.95 CPU) @ 10 +87.55/s (n=100000) elsif: 41 wallclock secs (35.22 usr + 0.01 sys = 35.23 CPU) @ 28 +38.49/s (n=100000) Rate dispatch elsif dispatch 1088/s -- -62% elsif 2838/s 161% --

Thanks in advance for your comments.

Humbly yours,
Aahhrh!
SSF

Comment on elsif chain vs. dispatch
Download Code
Re: elsif chain vs. dispatch
by Bloodnok (Vicar) on Apr 26, 2009 at 22:11 UTC
    I don't know for sure, but I wouldn't mind betting that it has to do with the traditional overheads associated with the use of hashes - in this case, hash lookup c/w direct code.

    As I say I don't know, but a pound to a pinch of sh!t that one of the more learned brethren will put me (rightfully) in my place.

    Update:

    I should have realised from my initial observation that the difference could be at least in part put down to the difference between run-time and compile-time machinations ... as since pointed out elsewhere.

    A user level that continues to overstate my experience :-))
Re: elsif chain vs. dispatch
by Burak (Chaplain) on Apr 26, 2009 at 22:19 UTC
    Your example is not practical. Dispatch tables are usually defined once outside the code that runs it. I suggest creating a test code like:
    my %dispatch; sub elsifchain { ... } sub dispatch { ... }
      That's a good point - the dispatch table is created on every invocation, the elsif code variant is only compiled once. The second point is that the split calls actually obscure the test results, not clarify them.

      So what I did was to change all the alternatives to return a literal 'A' instead, and to build the dispatch table only once. That's the result:

      Rate elsif dispatch elsif 252062/s -- -73% dispatch 927942/s 268% --
        I knew I was doing something dumb. Thanks!

        SSF

Re: elsif chain vs. dispatch
by almut (Canon) on Apr 26, 2009 at 22:23 UTC

    If you don't recreate the dispatch table upon every invocation of dispatch() by placing it outside of the routine, the difference already is much smaller:

    Rate dispatch elsif dispatch 9363/s -- -12% elsif 10627/s 13% --

    even though in that case, there's still a disadvantage on the side of the dispatch table, because the $y needs to be passed to the anonymous sub...

    my %dispatch=( A=>sub { split(/!/,$_[0]); }, ... ); sub dispatch { my $x=$letters[random($nLetters)]; my $y='xyzzy!' x random(1000); if (exists $dispatch{$x}) { $dispatch{$x}->($y); } else { warn "Huh?"; } }

      $y could be passed as a reference to avoid the string copy further reduce the disadvantage, particularly for large strings.

      my %dispatch=( A=>sub { split(/!/,$$_[0]); }, ... ); sub dispatch { my $x=$letters[random($nLetters)]; my $y='xyzzy!' x random(1000); if (exists $dispatch{$x}) { $dispatch{$x}->(\$y); } else { warn "Huh?"; } }

      On the other hand, maybe the parameter string isn't copied when the function references $_[0] directly?

        I had tried that.  And had first made the same mistake (and thus got a speed boost of around 800% :) —> the $$_[0] would need to be ${$_[0]}. With that fixed, there's virtually no difference...   In other words, the string isn't copied, even without using references ($_[0] is an alias).

Re: elsif chain vs. dispatch
by roubi (Hermit) on Apr 26, 2009 at 22:26 UTC
    It doesn't seem fair that you would be creating the hash 100000 times whereas your elsif chain gets compiled once?
Re: elsif chain vs. dispatch
by chromatic (Archbishop) on Apr 26, 2009 at 22:59 UTC

    Even though looking up an element of a hash tends to be an O(1) operation, its algorithmic complexity only describes the scaling factor of the algorithm as n increases. If you rearranged the if/else branches in decreasing order of probability according to your expected corpus, it would be faster than the hash lookup until you reached (wild guess) a dozen or so entries in the hash, or if your working corpus deviated significantly from your expected corpus.

    A good benchmark with working example data would reveal more, but remember that:

    • This depends on your version of Perl, your workload, and your machine.
    • Getting the best performance possible is a matter of tuning.
    • Perl is sensitive to fluctuations between different optrees and working sets.
    • This is unlikely a bottleneck in any meaningful application.
      For me, it is that the codebase is becoming unwieldy, so I wanted to switch to the dispatch style, and I was (honest) going to be putting it outside a sub so it only gets compiled once!

      Have you ever considered using the symbol table as the hash, would this be any good:

      $::{"handle_fieldcode_$code"}->($args); # for $code = 'A'..'Z'
      ??
      SSF

        The slow bit is the function call, not the hash lookup which you'd also have to do with the symtab anyway. With a hash, you have better control over what inputs are allowed and how they are dispatched.

        With symtab:

        my $handler = do { no strict 'refs'; \&{"handle_fieldcode_$code"} }; ...handle expections somehow?... die if !$handler; $handler->(@args);

        With hash;

        my $handler = $lookup{$code}; die if !$handler; $handler->(@args);

        with a one-time setup of

        my %lookup = map { no warnings 'refs'; $_ => \&{"handle_fieldcode_$_"} } 'A'..'Z'; ...modify %lookup for expections...
Re: elsif chain vs. dispatch
by JavaFan (Canon) on Apr 27, 2009 at 07:52 UTC
    You are not comparing looking up by iteration and looking up something in a hash. You are comparing looking up by iteration vs. building a hash, finding something in the hash, then doing a sub call. On top of that, you're calling split on a large string to make it even harder to draw any conclusions.

    But even if you fix all those issues, the still you can only draw conclusions if you have a large number of cases, and each case is picked equally well. Typically, one dispatches between far less cases, and not every case occurs equally often.

Re: elsif chain vs. dispatch
by DrHyde (Prior) on Apr 27, 2009 at 10:31 UTC

    When you say that something is O(N) or O(N^2) or whatever, you are saying that as N changes then the resource in question (time or memory normally) changes with that relation to it. So if something is O(N) and N doubles, then the time taken doubles. What this ignores is that the time taken is actually c*N or k*1, and that for different algorithms the constants will be different.

    So, the if/elsif/elsif/elsif/else chain actually takes c*N seconds, and the hash lookup and subroutine dispatch takes k*1 seconds. I'm not at all surprised that for small N then cN is smaller than k because it involves very few calculations so its constant term is a *lot* less than that for the hash lookup.

    I shall now proceed to pull some random numbers out of my arse.

    Calculating a hash takes of the order of a few hundreds of machine cycles (let's say a thousand). Finding the right place in the if/elsif/elsif/else chain will take more like a few tens of cycles. In fact, if the compiler optimises really well, checking and ignoring each condition will take just two machine cycles. Let's call it ten because this is perl, not C.

    If we also assume that on average it has to check N/2 conditions then the hash would only win for N > 200. That's ignoring, of course, any extra overhead from setting up both cases in the first place.

      When you say that something is O(N) or O(N^2) or whatever, you are saying that as N changes then the resource in question (time or memory normally) changes with that relation to it. So if something is O(N) and N doubles, then the time taken doubles.
      To be pedantic, that's not true. O(N) means that the growth is at most linear. O(N^2) means that the growth is at most quadratic. This means that any algorithm that is O(N) is also O(N log N) and O(N^2).

      If you want to express that an algorithm is linear (and not worst case linear), the correct function is use is called Θ.

      Note also that hash lookups are, worst case, Θ(N). There's always a chance that all hash keys map to the same value, resulting in a linear list that needs to be searched.

        Note also that hash lookups are, worst case, Θ(N). There's always a chance that all hash keys map to the same value, resulting in a linear list that needs to be searched.

        I recall that perls from 5.8.3 or so have code in place to watch out for this sort of degenerate case, and will rehash to prevent this from occurring.

        • another intruder with the mooring in the heart of the Perl

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://760195]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (11)
As of 2014-09-23 18:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (238 votes), past polls