Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^3: Best way to store/sum multiple-field records?

by Anonymous Monk
on Dec 23, 2014 at 01:36 UTC ( [id://1111115]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Best way to store/sum multiple-field records?
in thread Best way to store/sum multiple-field records?

I would have thought that declaring variables over and over would be less efficient than declaring them once at the start - that was my reasoning for declaring them before the loop, anyways...
Not only 'premature optimization is the root of all evil'; not only such a microoptimization is completely meaningless; but it's actually the other way around... declaring variables inside a loop is quite a bit faster. I guess due to Perl's own optimizations...
use strict; use warnings; use Benchmark qw( cmpthese ); my @strings = qw( USERID1|2215|Jones| USERID1|1000|Jones| USERID3|1495|Dole| USERID2|2500|Francis| USERID2|1500|Francis| ); cmpthese( 1_000_000, { outside => sub { my ( $x, $y, $z ); for (@strings) { ( $x, $y, $z ) = split /\|/; } }, inside => sub { for (@strings) { my ( $x, $y, $z ) = split /\|/; } } } );
result:
Rate outside inside outside 109890/s -- -38% inside 176678/s 61% --

Replies are listed 'Best First'.
Re^4: Best way to store/sum multiple-field records?
by choroba (Cardinal) on Dec 23, 2014 at 02:10 UTC
    To speed up split, specify the number of elements:
    ($x, $y, $z) = split /\|/, $_, 3;
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re^4: Best way to store/sum multiple-field records?
by BrowserUk (Patriarch) on Dec 23, 2014 at 02:52 UTC
    Not only 'premature optimization is the root of all evil'; not only such a microoptimization is completely meaningless; but it's actually the other way around... declaring variables inside a loop is quite a bit faster.

    Really? Go figure> (It's (much) more complicated than that!):

    use strict; use warnings; use Benchmark qw( cmpthese ); my @strings = qw( USERID1|2215|Jones| USERID1|1000|Jones| USERID3|1495|Dole| USERID2|2500|Francis| USERID2|1500|Francis| ); cmpthese( -1, { outside => sub { my ( $x, $y, $z ); for (@strings) { ( $x, $y, $z ) = split /\|/; } }, outside2 => sub { my ( $x, $y, $z ); for (@strings) { ( $x, $y, $z ) = split /\|/, 3; } }, inside => sub { for (@strings) { my ( $x, $y, $z ) = split /\|/; } }, inside2 => sub { for (@strings) { my ( $x, $y, $z ) = split /\|/, 3; } }, } ); __END__ C:\test>junk Rate outside inside inside2 outside2 outside 58201/s -- -38% -71% -73% inside 93659/s 61% -- -53% -57% inside2 197610/s 240% 111% -- -10% outside2 218802/s 276% 134% 11% --

    When you can explain that; then you may pontificate on the subject.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      When you can explain that; then you may pontificate on the subject.

      Explanation: You wrote broken code.

      $_ = 'USERID1|2215|Jones|'; my( $x, $y, $z ) = split /\|/; print "( $x, $y, $z )\n"; ( $x, $y, $z ) = split /\|/, 3; print "( $x, $y, $z )\n"; __END__ ( USERID1, 2215, Jones ) ( 3, , )

      Thanks for the permission. (:

      Update: Oh, and the explanation for the other part of the "(much) more complicated" mystery:

      Rate outside inside inside2 outside2 outside 58201/s -- -38% -71% -73% inside 93659/s 61% -- -53% -57% inside2 197610/s 240% 111% -- -10% outside2 218802/s 276% 134% 11% --

      That is, why is "outside" faster than "inside" while "inside2" is faster than "outside2"? Well, that's the classic point I try to get people to remember all the time: "11%" is simply "noise". Whether "inside2" or "outside2" will "win" depends on mostly random stuff (which one gets run first being the least random contributor that I've noticed).

      - tye        

        I thought it would be interesting to run BrowserUk's benchmark after having fixed the two little defects.

        This is the modified code:

        And the benchmark results:
        $ perl bench_inside_outside.pl Rate outside outside2 inside inside2 outside 90269/s -- -20% -40% -51% outside2 113390/s 26% -- -25% -39% inside 151060/s 67% 33% -- -19% inside2 185735/s 106% 64% 23% --
        So, (hoping the code is now correct), the results are now consistently showing (1) the quite strong advantage of declaring the variables inside the loop compared to doing before entering the loop (these results are well in line with AnonMonk's reported results), and (2) that choroba's idea to specify a limit also bring a measurable improvement (much less strong than the inside/outside declaration, but I would tend to think that a difference of about 25% is significant, and no longer noise).

        That second point is interesting, because I have experienced in the past that specifying a limit brings an improvement when the string being split would yield (without limit) more fields than the limit, presumably because Perl is able to stop processing the string as soon as the limit is reached, but I would have thought that this advantage would to a large extent vanish when the limit is the same as the number of potential fields in the string being split. Good to know. Thank you choroba for this comment.

        Explanation: You wrote broken code.

        Thanks. That had me stumped.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

      I feel better about this now ;)

      1 Peter 4:10
        I feel better about this now ;)

        S'not the first time I've been caught out by a benchmark, and trust me, it won't be the last :)

        And we are in good company!


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1111115]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (4)
As of 2024-04-25 13:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found