Beefy Boxes and Bandwidth Generously Provided by pair Networks RobOMonk
go ahead... be a heretic
 
PerlMonks  

Premature and micro optimization...

by chargrill (Parson)
on Jun 11, 2007 at 05:36 UTC ( #620413=perlquestion: print w/ replies, xml ) Need Help??
chargrill has asked for the wisdom of the Perl Monks concerning the following question:

... Surely the root of all evil, but can't a guy be curious?

First a little background, and apologies for the length of this post.

I'm working on a legacy web system.

When the user (or the server) encounters an exceptional situation, the web server generates an error "ticket" and display a "page not found" page. Reports of these tickets are emailed to a mailing list - it's really handy if we lose a database from the cluster and dozens or hundreds (sometimes thousands!) of users start generating tickets. Unfortunately, we'll also get tickets if web bots are crawling the site, etc.

It's been one of my sub-goals to rid our "ticket" system of spurious tickets. Not generating a ticket if the user-agent matched a web bot was easy.

We have users and their sessions, and these are tracked through a number of redundant methods, one of which is appending user ID and session information to the URI.

The User ID format is very specific - [0-9A-Z]{12}. (Then a dot, then the session ID.) If our system sees lowercase letters, it considers this an exceptional situation, and generates an error ticket. (Not my idea.) This is even though an ID would be valid should it only be translated to using uppercase letters. For most users (and their user-agent) - this isn't an issue. But, it appears as though some user agents attempt to pre- (or re-) fetch oft-visited pages (FunWebProducts springs to mind). Ignore the proper URI, try to fetch using a lowercase ID, ticket gets generated, I get an "error" in my mailbox. Annoying.

Should be easy to fix, no?

Long story short, the old code did this:

sub id_parse { my $string = shift; return split( /\./, $string, 2 ); }

My change was simple:

sub id_parse { my $string = shift; my( $id, $session ) = split /\./, $string, 2; $id =~ tr/a-z/A-Z/; return( $id, $session ); }

I test it out on the development server. It works, it doesn't give me a "page not found" error, doesn't generate a ticket - problem solved.

I ask my coworker for a code review.

My coworker taps me on the shoulder and says "So, we do this tr/// on every ID even though only a tiny subset of them are bad? I clearly have NO idea how expensive that is."

To borrow a phrase from bobf, I "went all perlmonks on him". "Blah blah blah premature optimization blah blah micro optimization blah blah blah." I essentially suggested we throw it on the servers and see if it causes a problem.

One giant long-running company meeting later, and I'm staring at 20 minutes until it's time to go home for the weekend. Honestly, my curiosity got the better of me and I quickly and quietly setup a little Benchmark.

Much to my surprise, my changed version performed better! Around 30% better!

Granted, I only performed the operation for only lowercased ID + session strings. Once I realized the folly of my ways and also tested properly uppercased ID + session strings, the results were closer.

But still it intrigued me.

Here's the benchmark code:

#!/usr/bin/perl use strict; use warnings; use Benchmark qw/:all/; my $TIMES = 500000; my $upper_id_string = "ABCD1EFGHI2J.a01f2345b067cde890f12gab345678c9"; my $lower_id_string = "abcd1efghi2j.a01f2345b067cde890f12gab345678c9"; sub only_tr { my $string = shift; my( $id, $session ) = split( /\./, $string, 2 ); $id =~ tr/a-z/A-Z/; return( $id, $session ); } sub old { my $string = shift; return split( /\./, $string, 2 ); } my $only_tr_low = sub { my( $id, $session ) = only_tr( $lower_id_string ) }; my $only_tr_upp = sub { my( $id, $session ) = only_tr( $upper_id_string ) }; my $old_low = sub { my( $id, $session ) = match_and_tr( $lower_id_string ) }; my $old_upp = sub { my( $id, $session ) = match_and_tr( $upper_id_string ) }; cmpthese( $TIMES, { 'Old way ' => $old_low, 'Only TR LC' => $only_tr_low, }, ); cmpthese( $TIMES, { 'Only TR UC' => $only_tr_upp, 'Old way U' => $old_upp, }, );

And the output:

Rate Old way Only TR LC Old way 280899/s -- -21% Only TR LC 357143/s 27% -- Rate Old way U Only TR UC Old way U 340136/s -- -4% Only TR UC 354610/s 4% --

From that, I determined that a micro-optimization was possible for the original code - but the results seem fairly clear to me... Performing the transliteration on every ID + session would be a net gain.

But why?

Adding the following (in the same manner) to the benchmark added to my mystery:

sub micro { my $string = shift; my( $id, $session ) = split( /\./, $string, 2 ); return( $id, $session ); }

Output:

Rate Old way Only TR LC Micro Optm Old way 277778/s -- -21% -28% Only TR LC 352113/s 27% -- -9% Micro Optm 387597/s 40% 10% -- Rate Old way U Only TR UC Micro OptU Old way U 340136/s -- -4% -12% Only TR UC 354610/s 4% -- -8% Micro OptU 384615/s 13% 8% --

Now the results match my expectation - performing a transliteration on everything is more expensive, but only if I use temporary lexical variables.

Checking B::Concise doesn't shed much light (for me, anyhow):

$ perl -MO=Concise -e ' sub foo { my $string = shift; my( $id, $sess ) = split( /\./, $string, 2 ); return( $id, $sess ); } foo( "abcd1efghi2j.a01f2345b067cde890f12gab345678c9" );' 7 <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 4 -e:1) v ->3 6 <1> entersub[t2] vKS/TARG,1 ->7 - <1> ex-list K ->6 3 <0> pushmark s ->4 4 <$> const[PV "abcd1efghi2j.a01f2345b067cde890f12gab345678c +9"] sM ->5 - <1> ex-rv2cv sK/1 ->- 5 <#> gv[*foo] s ->6 -e syntax OK $ perl -MO=Concise -e ' sub foo { my $string = shift; return split( /\./, $string, 2 ); } foo( "abcd1efghi2j.a01f2345b067cde890f12gab345678c9" );' 7 <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 3 -e:1) v ->3 6 <1> entersub[t2] vKS/TARG,1 ->7 - <1> ex-list K ->6 3 <0> pushmark s ->4 4 <$> const[PV "abcd1efghi2j.a01f2345b067cde890f12gab345678c +9"] sM ->5 - <1> ex-rv2cv sK/1 ->- 5 <#> gv[*foo] s ->6 -e syntax OK

I see one key difference in 2 <;> nextstate(main 3/4 -e:1 ->3 - is that it?

I'm hoping someone with more internals-fu than I can shed some light on this - if only to sate my insatiable curiosity.


--chargrill
s**lil*; $*=join'',sort split q**; s;.*;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.*,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$*,$/)

Comment on Premature and micro optimization...
Select or Download Code
Re: Premature and micro optimization...
by lidden (Deacon) on Jun 11, 2007 at 07:53 UTC
    Adding the match_and_tr sub missing from your code:
    sub match_and_tr { my $string = shift; my( $id, $session ) = split( /\./, $string, 2 ); $id =~ /[a-z]/ and $id =~ tr/a-z/A-Z/; return( $id, $session ); }
    I got :
    Rate Old way Only TR LC Old way 193050/s -- -8% Only TR LC 209205/s 8% -- Rate Only TR UC Old way U Only TR UC 210084/s -- -2% Old way U 213675/s 2% --

    Which is as expected. The sub that don't do any tring is of course faster.
Re: Premature and micro optimization...
by shmem (Canon) on Jun 11, 2007 at 08:39 UTC
    Seems like you messed up your benchmarks, and your results might be bogus therefore ;-)

    There's a sub missing; adapting your benchmark code to what you wrote, with some subs added

    I get different results -

    - quite the opposite of what you report. From the first block of results, doing a lc on $_[0] and no assignments at all

    sub short_lc { split( /\./, lc($_[0]),2 ) }

    seems to be fastest, as expected. Lowercase vs. uppercase of its argument doesn't make any difference (it is too small to be reliable and could change with the order the anonsubs where introduced).

    update: changed some wording

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

      You're absolutely right. :)

      I trimmed down my code before posting - without running it - which would've immediately pointed out the problem.

      See my post further on down the thread.


      --chargrill
      s**lil*; $*=join'',sort split q**; s;.*;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.*,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$*,$/)
The problem with premature optimization...
by nferraz (Monk) on Jun 11, 2007 at 11:07 UTC

    You're talking about a sub which can be executed 300 thousand times per second, which means it will take something like 3 microseconds of computer time each time it is called. If you "optimize" it, it will take 1-2 microseconds; if you do more checking and validation, which actually improves your program, it will take 5-10 microseconds.

    Now, take a look on the rest of your program: accessing the database, processing your data, rendering the template probably take a another hundreds or thousands of microseconds!

    The fact is that, no matter how much you "optimize" this sub -- you could make it instantaneously-fast, or even remove it, and it wouldn't have a significant impact on the global performance at all.

    Moreover, if someone actually considers the idea of removing functionality to improve the local performance, that's the perfect example of premature optimization being the root of all evil.

      Yes - you're right. I agree, but sometimes I get curious. In fact, my first suggestion was that it would likely make such little difference that no harm would be done by making the change live and then dealing with "performance issues" later, knowing full well that this wouldn't become an issue.


      --chargrill
      s**lil*; $*=join'',sort split q**; s;.*;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.*,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$*,$/)
      No doubt chargill will agree with you that this sort of micro-optimization is just plain silly, since he wrote
      To borrow a phrase from bobf, I "went all perlmonks on him". "Blah blah blah premature optimization blah blah micro optimization blah blah blah." I essentially suggested we throw it on the servers and see if it causes a problem.

      One giant long-running company meeting later, and I'm staring at 20 minutes until it's time to go home for the weekend. Honestly, my curiosity got the better of me and I quickly and quietly setup a little Benchmark.

      - but that's not the point of his post. It is about he being puzzled with why assignment of split results to two my-variables and returning those my-variables could be faster than directly returning the split results.

      Which isn't. :-)

      --shmem

      _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                    /\_¯/(q    /
      ----------------------------  \__(m.====·.(_("always off the crowd"))."·
      ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      The fact is that, no matter how much you "optimize" this sub -- you could make it instantaneously-fast, or even remove it, and it wouldn't have a significant impact on the global performance at all.

      That's a pretty narrow definition of 'global performance' there. Just because it's not the bottleneck in his application doesn't mean you should be happy to burn up CPU cycles endlessly while waiting for the database/network/user/whatever to complete.

      While I agree that premature optimization is a big waste of development time, at runtime some non-bottleneck optimization does add value. That CPU could be used by another process on the same server, or it could run cooler when there's nothing else to do.

      --
      [ e d @ h a l l e y . c c ]

Re: Premature and micro optimization...
by bart (Canon) on Jun 11, 2007 at 12:05 UTC
    Not only are you micro-optimizing, but you're looking, IMO, at the wrong spot. Inlining the code, thus not calling a sub, might produce a lot more savings in execution time, than skipping tr.

    And even then, you should consider the bigger picture: how much of your total execution time is spent in thi sub? Not much, that's my guess.

      Right - the purpose was not to micro-optimize.

      The purpose was that someone suggested that a simple change might have a negative impact on everything (needlessly applying tr/// on all IDs). I didn't think it would matter, I had time to kill, and I was curious if the impact would be measurable or not.


      --chargrill
      s**lil*; $*=join'',sort split q**; s;.*;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.*,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$*,$/)
        Hmm, I must not have expressed myself clearly. It is not you, but your colleague who is guilty of (passive) micro-optimization, and focussing on a wrong thing.

        I agree that adding to the code and ending up with something that would appear to be faster, is indeed bizarre.

Re: Premature and micro optimization...
by chargrill (Parson) on Jun 11, 2007 at 12:11 UTC

    Wow.

    Thanks for thwapping me with a cluebat, all.

    Indeed, match_and_tr() was left out of what I posted here, as this was an attempt to appease my coworker, and I figured the OP was getting long enough. I copied and pasted precisely from my benchmark code, but ripped out the bits I thought I wasn't calling.

    But hey, it helped immediately show you, gentle reader, that I wasn't testing what I thought I was testing, because I accidentally forgot to change "match_and_tr" to "old" when I was initially copying & pasting my benchmark setup code, way back when I first setup the benchmark.


    --chargrill
    s**lil*; $*=join'',sort split q**; s;.*;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.*,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$*,$/)
Re: Premature and micro optimization...
by ikegami (Pope) on Jun 11, 2007 at 16:21 UTC

    How very odd!

    My testing on Win2k, ActivePerl 5.8.8 shows that the case of the data doesn't affect the timing, and that old_way is consistently just shy of 50% faster than with_tr.

    Upper case data =============== Rate with_tr old_way with_tr 183850/s -- -32% old_way 272269/s 48% -- Lower case data =============== Rate with_tr old_way with_tr 182262/s -- -33% old_way 270854/s 49% --

    Benchmark code:

    Of course, inlining the code saves a lot of time:

    Rate with_tr old_way with_tr_i old_way_i with_tr 180473/s -- -32% -39% -52% old_way 265506/s 47% -- -10% -29% with_tr_i 296058/s 64% 12% -- -20% old_way_i 372377/s 106% 40% 26% --

    Benchmark code:

Re: Premature and micro optimization...
by Dominus (Parson) on Jun 14, 2007 at 07:28 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://620413]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (13)
As of 2014-04-25 08:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (585 votes), past polls