Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Question about curious performance of if...elsif block

by markseger (Beadle)
on Jan 16, 2012 at 15:11 UTC ( #948159=perlquestion: print w/ replies, xml ) Need Help??
markseger has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to understand why a block of if...elsif statements seem to be running slow, at least to me. So I conducted a few experiments and don't know enough about the innards of perl to explain why I'm seeing what I'm seeing or how I might be able to improve things.

First the basics - here's a very simple script that executes these statements 1M times inside a subroutine:

#!/usr/bin/perl -w for (my $i=0; $i<1000000; $i++) { test($i) } sub test { my $a=$_[0]; # return; if ($a==1) {} elsif ($a==2) {} elsif ($a==3) {} elsif ($a==4) {} elsif ($a==5) {} elsif ($a==6) {} elsif ($a==7) {} elsif ($a==8) {} elsif ($a==9) {} }
and the difference in the times I see to execute it with and without the return statement commented out is about 3/4 of a second, so that's my approximate baselevel for executing the block of code.

The real question is I have a script that reads /proc/pid/status for all processes on my system, about 200 of them. There are 36 entries in the file and I execute the reading in a loop 1440 times (the equivalent of doing this once a minute for a day, in case anyone cares). This comes to executing that block about 1M times which is the reason for using that number in my test script above.

#!/usr/bin/perl -w my $data; for (my $i=0; $i<1440; $i++) { opendir PROC, '/proc'; while (my $pid=readdir PROC) { next if $pid!~/^\d/; open STAT, "</proc/$pid/status" or next; while ($data=<STAT>) { test($i); } close STAT; } } sub test { my $a=shift; return; if ($a==1) {} elsif ($a==2) {} elsif ($a==3) {} elsif ($a==4) {} elsif ($a==5) {} elsif ($a==6) {} elsif ($a==7) {} elsif ($a==8) {} elsif ($a==9) {} }
I would have expected the difference between running this script with the return statement commented out and not commented out to be on the order of a second or less yet it's actually over 5 seconds. This seems like a huge difference and I'd like to know if anyone can tell me why and more importantly if there's a way to improve things.

-mark

Comment on Question about curious performance of if...elsif block
Select or Download Code
Re: Question about curious performance of if...elsif block
by JavaFan (Canon) on Jan 16, 2012 at 15:35 UTC
    First of all, you're talking about a time-difference, but there's no context. The difference between 0.01 and 1.01 seconds is far more interesting than the difference between 15h36m55s and 15h36m56s.

    Second, your first talk about a difference of 3/4 of a second, then of a difference of 5. There's quite a difference in differences.

    Third, without the return, you're doing 10 comparisons in an otherwise almost empty sub (the first time the sub is called, memory needs to be allocated, but Perl doesn't throw it away, making that in any subsequent call to the sub, the assignment to $a is relatively cheap), so I do expect there to be a difference.

    Fourth, use the NYTProf profiler, and see how much time each statement takes.

      sorry if I wasn't being clear enough...

      >First of all, you're talking about a time-difference, but there's no context. The difference between 0.01 and 1.01 seconds is far more interesting than the difference between 15h36m55s and 15h36m56s.

      I have a very large perl script (on the order of 10K lines of code). This is something that takes about 1-1/2 minutes overall to execute. does this help with the context?

      I was looking more closely at some of the execution paths for opportunities of optimization and was wondering what benefit there would be to order the options in some big if...elsif blocks more carefully ordered and thought that it might be so I did some experiments

      BUT if a 1M compares takes <1sec as it does with my first test script, it's probably not as significant as I thought, but 5 seconds with the second scrpipt begins to get my attention.

      >Second, your first talk about a difference of 3/4 of a second, then of a difference of 5. There's quite a difference in differences.

      That's my point but I guess I didn't phrase the question properly, so let me try again.

      In the first case what I'm trying to say is that to run that block of the 10 if...elsif... statements 1M times, takes about 3/4 of a second, the difference being that between the time it takes to execute them vs not-execute them.

      However, when that identical block of code is run in the second script it takes over 7 times longer. THAT is my question - why?

      >Third, without the return, you're doing 10 comparisons in an otherwise almost empty sub (the first time the sub is called, memory needs to be allocated, but Perl doesn't throw it away, making that in any subsequent call to the sub, the assignment to $a is relatively cheap), so I do expect there to be a difference.

      Agreed and as I said above, that difference is the time takes to execute the 10 comparisons.

      >Fourth, use the NYTProf profiler, and see how much time each statement takes.

      I know, I was being lazy, but given the dramatic differences in the times I'd thing even my coarser level of analysis should be valid.

      -mark

Re: Question about curious performance of if...elsif block
by BrowserUk (Pope) on Jan 16, 2012 at 18:59 UTC
    I would have expected the difference between running this script with the return statement commented out and not commented out to be on the order of a second or less yet it's actually over 5 seconds. This seems like a huge difference and I'd like to know if anyone can tell me why and more importantly if there's a way to improve things.

    The reason it takes so long -- although in my tests of your sample code show the difference to be far less than 5 seconds -- is because for the vast majority of the cases, $i = 10 through 1,440 (or 1,000,000), you are performing all nine comparisons before returning.

    The simple solution is to avoid doing those comparisons 99% (or 99.99999^) of the time:

    sub test3 { return if $_[0] > 9; my $a = $_[0]; if ($a==1) {} elsif ($a==2) {} elsif ($a==3) {} elsif ($a==4) {} elsif ($a==5) {} elsif ($a==6) {} elsif ($a==7) {} elsif ($a==8) {} elsif ($a==9) {} }

    On my system this consistently runs even faster than your test() sub with the unconditional return (test2 below):

    C:\test>junk37 test1 took: 0.67200 seconds test2 took: 0.30307 seconds test3 took: 0.36421 seconds C:\test>junk37 test1 took: 0.66500 seconds test2 took: 0.30315 seconds test3 took: 0.29603 seconds C:\test>junk37 test1 took: 0.67800 seconds test2 took: 0.30240 seconds test3 took: 0.29488 seconds C:\test>junk37 test1 took: 0.66600 seconds test2 took: 0.30331 seconds test3 took: 0.29491 seconds C:\test>junk37 test1 took: 0.66900 seconds test2 took: 0.30349 seconds test3 took: 0.29475 seconds

    Update: Also, if your if/else cascade is bigger than 9, then you might consider using a dispatch table:

    my @dispatch = ( sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, ); sub test4 { return if $_[0] > 9; $dispatch[ $_[0] ]->( $_[0] ); }

    This runs pretty nearly exactly the same speed as test3 above when the number of blocks is low (9 in this case), but once you get to around 20 or so, it starts to win more substantially.

    And if your cases are less amenable to using an array, then you can use a hash instead.

    But do construct the dispatch table outside the subroutine. Do it (every time) inside teh sub and it will slow to a crawl.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      If you read the base note more carefully you're see that /proc/pid/status contains 36 entries on my system.

      yes I am doing the comparisons before returning and I want to so I can measure the difference between executing them and returning immediately and the difference is 5 seconds, not the 3/4 second I would have expected.

      understand now?

      -mark

        understand now?

        Apparently not.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

        Even with 36 cases, I still see nothing like the 5 second difference you apparently are:

        #! perl -slw use strict; use Time::HiRes qw[ time ]; sub test1 { my $a = $_[0]; # return; if ($a==1) {} elsif ($a==2) {} elsif ($a==3) {} elsif ($a==4) {} elsif ($a==5) {} elsif ($a==6) {} elsif ($a==7) {} elsif ($a==8) {} elsif ($a==9) {} elsif ($a==10) {} elsif ($a==11) {} elsif ($a==12) {} elsif ($a==13) {} elsif ($a==14) {} elsif ($a==15) {} elsif ($a==16) {} elsif ($a==17) {} elsif ($a==18) {} elsif ($a==19) {} elsif ($a==20) {} elsif ($a==21) {} elsif ($a==22) {} elsif ($a==23) {} elsif ($a==24) {} elsif ($a==25) {} elsif ($a==26) {} elsif ($a==27) {} elsif ($a==28) {} elsif ($a==29) {} elsif ($a==30) {} elsif ($a==31) {} elsif ($a==32) {} elsif ($a==33) {} elsif ($a==34) {} elsif ($a==35) {} elsif ($a==36) {} } sub test2 { my $a = $_[0]; return ; if ($a==1) {} elsif ($a==2) {} elsif ($a==3) {} elsif ($a==4) {} elsif ($a==5) {} elsif ($a==6) {} elsif ($a==7) {} elsif ($a==8) {} elsif ($a==9) {} elsif ($a==10) {} elsif ($a==11) {} elsif ($a==12) {} elsif ($a==13) {} elsif ($a==14) {} elsif ($a==15) {} elsif ($a==16) {} elsif ($a==17) {} elsif ($a==18) {} elsif ($a==19) {} elsif ($a==20) {} elsif ($a==21) {} elsif ($a==22) {} elsif ($a==23) {} elsif ($a==24) {} elsif ($a==25) {} elsif ($a==26) {} elsif ($a==27) {} elsif ($a==28) {} elsif ($a==29) {} elsif ($a==30) {} elsif ($a==31) {} elsif ($a==32) {} elsif ($a==33) {} elsif ($a==34) {} elsif ($a==35) {} elsif ($a==36) {} } sub test3 { return if $_[0] > 9; my $a = $_[0]; if ($a==1) {} elsif ($a==2) {} elsif ($a==3) {} elsif ($a==4) {} elsif ($a==5) {} elsif ($a==6) {} elsif ($a==7) {} elsif ($a==8) {} elsif ($a==9) {} elsif ($a==10) {} elsif ($a==11) {} elsif ($a==12) {} elsif ($a==13) {} elsif ($a==14) {} elsif ($a==15) {} elsif ($a==16) {} elsif ($a==17) {} elsif ($a==18) {} elsif ($a==19) {} elsif ($a==20) {} elsif ($a==21) {} elsif ($a==22) {} elsif ($a==23) {} elsif ($a==24) {} elsif ($a==25) {} elsif ($a==26) {} elsif ($a==27) {} elsif ($a==28) {} elsif ($a==29) {} elsif ($a==30) {} elsif ($a==31) {} elsif ($a==32) {} elsif ($a==33) {} elsif ($a==34) {} elsif ($a==35) {} elsif ($a==36) {} } my @dispatch = ( sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, sub {}, ); sub test4 { return if $_[0] > 9; $dispatch[ $_[0] ]->( $_[0] ); } my $start = time; test1( $_ ) for 1 .. 1e6; printf "test1 took: %.5f seconds\n", time - $start; $start = time; test2( $_ ) for 1 .. 1e6; printf "test2 took: %.5f seconds\n", time - $start; $start = time; test3( $_ ) for 1 .. 1e6; printf "test3 took: %.5f seconds\n", time - $start; $start = time; test4( $_ ) for 1 .. 1e6; printf "test4 took: %.5f seconds\n", time - $start; __END__ C:\test>junk37 test1 took: 1.70500 seconds test2 took: 0.30789 seconds test3 took: 0.29240 seconds test4 took: 0.29207 seconds

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Re: Question about curious performance of if...elsif block
by sundialsvc4 (Abbot) on Jan 16, 2012 at 23:52 UTC

    One trick that might help here is to define an array (or a hash...) of subs; that is to say, of “code references.”   Directly fetch and execute (if it exists) the specific subroutine that you want to execute.   So you’re not performing 39 if-tests in a row in order to determine that you want to execute subroutine #40...

    I am not asserting that it will help in your particular, very rarified and performance-hungry case.   Only empirical testing would establish that.

      > Directly fetch and execute (if it exists) the specific subroutine that you want to execute.

      For those of use who not only remember but actually used them, this sounds like fortran's computed goto statement. However in my case it's not that I want to execute different subs, but rather blocks of code. Can I do the same thing using labels? That really would make it a computed goto. ;)

      However I would still like to hear any and all theories about why this curious behavior. The script in question is 'collectl', an opensouce project I released a number of years ago. It's a very light-weight system monitoring tool and am always looking for way to make it even lighter-weight and this feels like an opportunity.

      On a different note, my comments about rhel6.2 were wrong. I forgot the VM has <100 processes so I'm actually executing have the number of tests and so the performance improvement over the earlier tests seems as little better but not that much better.

      -mark

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://948159]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (11)
As of 2014-12-18 07:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (44 votes), past polls