Beefy Boxes and Bandwidth Generously Provided by pair Networks Russ
Syntactic Confectionery Delight
 
PerlMonks  

Optimizing existing Perl code (in practise)

by JaWi (Hermit)
on Aug 18, 2002 at 21:38 UTC ( #191025=perlquestion: print w/ replies, xml ) Need Help??
JaWi has asked for the wisdom of the Perl Monks concerning the following question:

My fellow monks, I'm requesting for your greater knowledge once again!

I code in Perl for about 3-4 years now, and never I took care about how to write fast Perl code. I recently started to `rethink' about my written code/snippets and was wondering about the performance of various approaches of the same functionality.
Most of the documents about Perl stress on the various ways of writing code, but not on the performance of those approaches (I thus can assume these various ways don't affect the program's performance??)

Now for the real question: how do you, my fellow monk, optimize your Perl code? Are you following some specific approaches, explicitly avoid specific structures in your code? Or is it all magic?

My sincere gratitudes,

-- JaWi

"A chicken is an egg's way of producing more eggs."

Comment on Optimizing existing Perl code (in practise)
Re: Optimizing existing Perl code (in practise)
by atcroft (Monsignor) on Aug 18, 2002 at 22:07 UTC
(jeffa) Re: Optimizing existing Perl code (in practise)
by jeffa (Chancellor) on Aug 18, 2002 at 22:21 UTC
    If it has been said once it has been said a thousand times "beware of premature optimization!" Ask yourself, "does this really need to be faster? Really?"

    I think a very important item to optimize is code maintainabibilty - how easy is it to extend your program and fix bugs that break your code?

    So, how do i optimize my Perl code? I generally don't (but i do try to get it right the first time - measure twice, cut once). If i do, it is to replace areas of wheel re-invention with CPAN modules, or to refactor items into classes to improve robustness. If i wanted faster code i would port it to C instead, but since most of what i write relies on database and web servers, Perl is 90% of the time not the bottleneck.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
      The fastest script in the world is worthless if a change in your
      system's directory structure breaks your code and you can't fix it.

      If you bother to optimize for anything, do it for maintainability
      But never forget "monitorability".

      Unless your script is being called to do huge jobs, or your resources
      are very restricted (Sparc Ultra 1 or Intel 486, etc.) optimization for
      speed is not usually that big an issue.

      However, thorough and correct logging of events, meaningful commentary
      in the script itself, reusability of the code; these will all help
      with maintainability.

        Another, and I think far more common case, where optimization for speed is rightfully desirable would be the code that drives a dynamic website. See perrin's impressive eToys success story for an admittedly extreme example; when you're facing a million pageviews an hour, you don't want your code to be wasting time, but even much lesser loads make speed an important goal. Nevertheless, of course, it does not override the factor of maintainability.

        Makeshifts last the longest.

Re: Optimizing existing Perl code (in practise)
by sauoq (Abbot) on Aug 18, 2002 at 22:50 UTC
    The real trick isn't optimizing your code but optimizing your solution. Maybe you can write the same code three different ways but if that code implements an O(N2) algorithm when there is an O(N) algorithm that will do it doesn't matter much whether you shave a few microseconds off each iteration.

    Successfully choosing the right algorithm takes careful consideration of the problem. If there is a secret to it at all it's probably choosing the right representation for your data. How to do that is a matter of experience and education. There isn't a cookbook solution to it because it usually depends greatly upon details of the problem you need to solve.

    -sauoq
    "My two cents aren't worth a dime.";
    
      The real trick isn't optimizing your code but optimizing your solution.

      this is a good paraphrase of my answer... hell yeah, i optimize my code... but it's the data bottlenecks i optimize, not the millisecond differences you'd only get by porting to C. when all i get is milliseconds, i purely write for readability/security/correctness, not speed.

      i try to understand what my database does to retrieve/store data, so the requests i make of it can use indices, not full searches. i try to dump the results of a query into a perl hash if i reuse it, to avoid requerying. i try to keep the data ordered in such a way as to avoid the actual data structure causing problems, etc.

      i find that for what i want to do one or both of {perl, mysql} always has the data structure i want (in terms of efficiency of search, add, delete), so i can almost always avoid doing the hard work.. that makes for efficiency of development on several levels. by knowing perl/mysql i have an easy answer to all my data structure problems, i just have to use the struct, not build one first. and it is readable by others simply by virtue of being standardized (ie SQL 92)

      second, if there is an error in the data structure, the easy to fix ones are my fault (i used the struct wrong), and the hard ones are someone elses. if i'm using say, some feature new to mysql 4.0.0alpha and it breaks, we just add it to the bug list and wait for 4.0.1 -- the fix is free, i or my employer don't have to pay for me to debug, rewrite, debug, etc...

      if i were the guy writting mysql it would be a different story, but it's not like they write it in perl either...
Re: Optimizing existing Perl code (in practise)
by derby (Abbot) on Aug 18, 2002 at 22:59 UTC
    check out Effective Perl Programming - it's a great resource that shows you idiomatic perl which 4 times out of 5 is faster than the most of the other ways in TMTOWTDI.

    -derby

Re: Optimizing existing Perl code (in practise)
by semio (Friar) on Aug 19, 2002 at 06:44 UTC
    I found myself also asking this question based on some feedback I received from a recent question I posted - converting hex to char. In this string, unpack and printf were presented as options for converting data. To test the performance for each, I did the following:
    #!c:/perl/bin/perl -w use strict; use POSIX qw(strftime); my $x; my $maxint = 200000; my $start = strftime "%H:%M:%S", localtime; for ($x=0; $x <$maxint;$x++) { print unpack "H*", "abc" } my $finish = strftime "%H:%M:%S", localtime; print "$start $finish";
    Results: 01:32:57 01:33:48 (51 seconds)
    #!c:/perl/bin/perl -w use strict; use POSIX qw(strftime); my $x; my $maxint = 200000; my $start = strftime "%H:%M:%S", localtime; for ($x=0; $x <$maxint;$x++) { printf "%x%x%x",ord('a'),ord('b'),ord('c'); } my $finish = strftime "%H:%M:%S", localtime; print "$start $finish";
    Results: 01:31:56 01:32:50 (54 seconds)

    In this case, unpack is the clear winner, although the performance difference doesn't become apparent until after 100000 iterations. So, in my opinion, being that TIMTOWTDI, I would look for a performance differential between these methods and opt for the one that requires the least amount of execution time.

    The second thing I would check to see if any shelling out can be replaced by an available perl function. I recently wrote a program that required that the date/time stamps in a log file be updated. For this, I made the mistake of relying on shelling out

    my $time1 = `date '+%H:%M:%S'`;
    when I should have used

    my $time1 = strftime "%H:%M:%S", localtime;
    Hope this helps.

    cheers, -semio

      You should definately look into Benchmark. I was able to reduce your test down to this, and I get the CPU usage

      use strict; use Benchmark; timethese(1500000, { 'unpack' => 'unpack "H*", "abc"', 'sprintf' => 'sprintf "%x%x%x",ord("a"),ord("b"),ord("c +")' } );

      The Results:
      Benchmark: timing 1500000 iterations of sprintf, unpack... sprintf: 0 wallclock secs ( 0.17 usr + 0.00 sys = 0.17 CPU) @ 88 +23529.41/s (n=1500000) (warning: too few iterations for a reliable count) unpack: 10 wallclock secs ( 9.87 usr + 0.01 sys = 9.88 CPU) @ 15 +1821.86/s (n=1500000)

      ACCCK!!!Abigail-II caught me in a latenight brain seizure. I shoulda been tipped off by sprintf winning. :( ++Abigail-II



      grep
      Mynd you, mønk bites Kan be pretti nasti...
        You should always be very suspicious if your benchmark shows results of 8823529.41 runs/second. Specially when it comes to non-trivial tasks like sprintf() - after all, than requires perl to parse a format.

        Another thing that should ring loud bells is that you are doing sprintf() in void context. That's not a natural operation. Perhaps Perl optimizes that away for you - totally screwing up your benchmark. It's a simple test:

        $ perl -MO=Deparse -wce 'sprintf "%x%x%x", ord ("a"), ord ("b"), o +rd ("c")' Useless use of a constant in void context at -e line 1. BEGIN { $^W = 1; } '???'; -e syntax OK $
        Indeed, you just benchmarked how fast perl can do an empty loop. Not very useful. Your benchmark should include assigning the result to a variable. So, you might want to do:
        #!/usr/bin/perl use strict; use warnings 'all'; use Benchmark; timethese -10 => { unpack => '$_ = unpack "H*" => "abc"', sprintf => '$_ = sprintf "%x%x%x", ord ("a"), ord ("b"), ord ( +"c")', } __END__ Benchmark: running sprintf, unpack for at least 10 CPU seconds... sprintf: 11 wallclock secs (10.25 usr + 0.00 sys = 10.25 CPU) @ 77 +5053.56/s (n=7944299) unpack: 11 wallclock secs (10.48 usr + 0.01 sys = 10.49 CPU) @ 33 +1145.09/s (n=3473712)
        It looks like sprintf is still a winner. But is it? Let's check the deparser again:
        $ perl -MO=Deparse -wce '$_ = sprintf "%x%x%x", ord "a", ord "b", +ord "c"' BEGIN { $^W = 1; } $_ = '616263'; -e syntax OK $
        Oops. Perl is so smart, it figured out at compile time the result of the sprintf. We'd have to make the arguments of sprintf variable to make Perl actually do work at run time:
        $ perl -MO=Deparse -wce '($a, $b, $c) = split // => "abc"; $_ = sprintf "%x%x%x", ord $a, ord $b, ord $c' BEGIN { $^W = 1; } ($a, $b, $c) = split(//, 'abc', 4); $_ = sprintf('%x%x%x', ord $a, ord $b, ord $c); -e syntax OK $
        And only now we can run a fair benchmark:
        #!/usr/bin/perl use strict; use warnings 'all'; use Benchmark; use vars qw /$a $b $c $abc/; $abc = "abc"; ($a, $b, $c) = split // => $abc; timethese -10 => { unpack => '$_ = unpack "H*" => $::abc', sprintf => '$_ = sprintf "%x%x%x", ord $::a, ord $::b, ord $:: +c', } __END__ Benchmark: running sprintf, unpack for at least 10 CPU seconds... sprintf: 11 wallclock secs (10.51 usr + 0.01 sys = 10.52 CPU) @ 20 +8379.75/s (n=2192155) unpack: 10 wallclock secs (10.10 usr + 0.00 sys = 10.10 CPU) @ 32 +3836.04/s (n=3270744)
        And guess what? unpack is the winner!

        The moral: no benchmark is better than a bad benchmark.

        Abigail

      my $time1 = strftime "%H:%M:%S", localtime;
      You mean s/localtime/time/ of course.

      Makeshifts last the longest.

        You mean s/localtime/time/ of course.

        I sure hope he doesn't. From the POSIX perldoc page:

        Synopsis: strftime(fmt, sec, min, hour, mday, mon, year, wday = -1, yday = -1, + isdst = -1)
        Those are the same values as returned by localtime().
      The second thing I would check to see if any shelling out can be replaced by an available perl function. I recently wrote a program that required that the date/time stamps in a log file be updated. For this, I made the mistake of relying on shelling out

      This one particular piece of advice is very good. A peave of mine is when I see people who write Perl scripts and all the work in them is done by using system() calls. What is the point in writing a Perl script if you're not going to use the Perl functions? You might as well write the thing in shell.

      Spawning system calls does take more resources and thus it behooves the Perl programmer to try and code the functionality they want using Perl built-ins and modules.

      gj! ++ on this one.

      _ _ _ _ _ _ _ _ _ _
      - Jim
      Insert clever comment here...

        A peave of mine is when I see people who write Perl scripts and all the work in them is done by using system() calls. What is the point in writing a Perl script if you're not going to use the Perl functions? You might as well write the thing in shell.
        And a "peave" of me is people who see everything black-and-white. I've written Perl programs where the majority of the work was done doing "system". What's the point of using a glue language, and not glueing? You might as well write the thing in C.

        Your point of view is quite opposite of the viewpoint of "code reuse". Unix comes with a handy toolkit. There's nothing wrong with using it.

        You might as well write the thing in shell.
        Not always. Perl gives you more control flow syntax than a shell.
        Spawning system calls does take more resources and thus it behooves the Perl programmer to try and code the functionality they want using Perl built-ins and modules.
        Bull. Programming means making trade-offs between developer time and run-time. The fact that you have choosen Perl instead of say, C, means that you strongly favour developer time over run time. Your arguments make sense if you are a C coder - but for a Perl coder they are just silly.

        Really, what's the point of writing:

        my $text = do { open my $fh => $file or die "open: $!\n"; local $/; <$fh>; };
        If you can just write:
        my $text = `cat $file`;
        Most programs won't read in gazillions of files in a single program, so the extra overhead is minute. Far less than the sacrifice you already made by using Perl instead of C. I also prefer
        system mkdir => -p => $dir;
        over the Perl equivalent. It takes to long to figure out which module implemented it, and to download and install it.

        Of course, making use of external programs makes you less portable, but so does making use of modules not coming with the core. And many programs dealing with file names aren't portable anyway. Do you always use File::Spec when dealing with file names? I certainly don't.

        I'm not claiming everything should be done with system. Not at all. But I don't thing that everything that can be done in Perl should, and that therefore system should be avoided.

        Abigail

Re: Optimizing existing Perl code (in practise)
by JaWi (Hermit) on Aug 19, 2002 at 10:05 UTC
    Fellow Monks, I thank you for all the replies! I will retreat myself now, and re-think my coding style.
    Your answers have set me thinking... and it hurts :-)

    Greets to all,

    -- JaWi

    "A chicken is an egg's way of producing more eggs."

Re: Optimizing existing Perl code (in practise)
by gmpassos (Priest) on Aug 19, 2002 at 11:47 UTC
    Well, if you really want to make some code faster make a XS, in other words make it in C. But this is only good to do with filters, crypters, etc...

    To win speed, you can make tests of your code, specially inside loops, peaces that will be runned a lot of times, to find the best way to write it! Here are some tips:

    Variables:
    Don't use:
    $var = $var . "add" ;
    The best way is:
    $var .= "add" ;
    The first way (wrong) will rewrite all the variable in the memory, the second will only add the new data. Use the same idea for: += , -= , *= , /=

    For subs use the content of the @_, specially for big data sent to the function. If you want speed use first the @_[0], then if you need to change the data inside @_[0], you use my ($var) = @_ ;, and if you have big data you use the "shift".
    Don't use for big data:
    sub { my ($var1,$var2) = @_ ; }
    The best way is to use the @_[0] it self or the shift:
    sub { my $var1 = shift ; my $var2 = shift ; }
    * If you use @_[?] you can't modifie it, you need to past to a $scalar.

    If you have a loop (while,for,foreach) that will be runned a lot of times, try to not use the my inside it:
    Normal way: for(0..10) { my $var = $_ ; }
    Faster:
    my $var ;
    for(0..10) { $var = $_ ; }
    * Of course this will only improve speed if you try to make the my outside for all the variables, in other words for bigger codes inside the loop.

    Don't use local(), my() is faster! The command local() in the begin of perl was used like my, but now it's only good if you want to make local *HANDLES, not variables.

    Try to use the variables in this order: $scalar, @array, %hash. Some thimes we use %h or @a and they aren't needed, but they are more slower than $s and use more memory, specially %h!

    About regular expressions (RE), use it only when it's needed! Dont make this: if($var =~ /x/) if you can do if($var eq 'x'). But some times RE can be faster than bigger codes, the best way to chose is test the 2 codes.

    But always think that any tip here will improve some microseconds for you. Only spend time improving speed in the peaces of your code that really need! Always try to use the resources of core, don't remake things that can be made by Perl it self.

    "The creativity is the expression of the liberty".

      For subs use the content of the @_, specially for big data sent to the function:
      Don't use:
      sub { my ($var1,$var2) = @_ ; }
      The best way is to use the @_[0] it self or the shift:
      sub { my $var1 = shift ; my $var2 = shift ; }
      * If you use @_[?] you can't modifie it! Use shift if you need to write to the var.

      I was pretty sure that was wrong when I read it, so I whipped out Benchmark:

      #!/usr/bin/perl -w use strict; use Benchmark qw(cmpthese); sub shifter { my $a=shift; my $b=shift; my $c=shift; my $d=shift; my $e=shift; my $f=shift; return $a*$b*$c*$d*$e*$f; } sub assigner { my ($a,$b,$c,$d,$e,$f)=@_; return $a*$b*$c*$d*$e*$f; } sub direct { return $_[0]*$_[1]*$_[2]*$_[3]*$_[4]*$_[5]; } cmpthese(-5, { 'shifter' => sub {shifter(1,2,3,4,5,6);}, 'assigner' => sub {assigner(1,2,3,4,5,6);}, 'direct' => sub {direct(1,2,3,4,5,6);}, } );
      Results:
      $ perl testSubs.pl Benchmark: running assigner, direct, shifter, each for at least 2 CPU +seconds... assigner: 0 wallclock secs ( 2.06 usr + 0.02 sys = 2.08 CPU) @ 384 +577.33/s (n=800690) direct: 3 wallclock secs ( 2.04 usr + 0.00 sys = 2.04 CPU) @ 629 +222.22/s (n=1285501) shifter: 2 wallclock secs ( 2.09 usr + 0.00 sys = 2.09 CPU) @ 294 +563.31/s (n=616521) Rate shifter assigner direct shifter 294563/s -- -23% -53% assigner 384577/s 31% -- -39% direct 629222/s 114% 64% --
      That's with perl 5.6.1... Maybe 5.8.0 optimized shift? But you'd have to keep the old values around and have a "front" entry in the AV, and I don't remember seeing anything about that.
      --
      Mike
        Hy,

        The "shift" options is good to use when you send big data to the function! The process of the command is not fast, because it need to cut the value from the array, reorder the array, and create and save to a scalar variable! "shift" is good to use for big data because you don't leave in the memory the data 2 times! You just move to the scalar! If you want speed use first the @_[0], then if you need to change the data inside @_[0], you use my ($var) = @_ ;, and if you have big data you use the "shift".

        "The creativity is the expression of the liberty".

Re: Optimizing existing Perl code (in practise)
by pingo (Hermit) on Aug 19, 2002 at 14:06 UTC
    For my part, I don't do much optimizing. Instead, I rely on fastcgi to make my perl scripts fast enough (of course, this only applies to cgi).
Re: Optimizing existing Perl code (in practise)
by feloniousMonk (Pilgrim) on Aug 19, 2002 at 18:02 UTC
    I definitely think benchmarking is the key answer here.

    I think no matter what, this is an implementation specific problem. I always wrote Perl for
    programmer speed, and paid less attention to execution speed. Until I started working on problems
    that were big enough to deal with datasets ranging from hundreds of meg to a few gig in size.

    I love Perl but for data this big, and the bit of processing required, I would have initially went
    with either C or C++. BUT - I work in a place where most everyone knows Perl and not many know C/C++
    so Perl optimization has become a big issue.

    I've learned a lot about how slight code changes can increase efficiency, especially when
    certain tasks need to be done many times over. I've seen major speed increases
    just by benchmarking and trying a different solution, but keeping the same algorithm.
    Things especially like
    my @a = (); if ( $foo =~ /^(\d+)\s+(\w+)\s*$/ ) { @a = ($1, $2); }
    vs.
    my @a = split (/\s+/, $foo);


    Guess what? In my system, option #1 runs about 90% faster.


    -felonious --
      Those two code snippets are not at all similar in function, so benchmarking them is useless.
        Um, they do perform the same function. They both place 2 variables
        into an array....

        Yes, the method is different but what I intended to illustrate is that for a given set of data,
        2 different methods of processing may have significant performance differences
        while giving the same results.

        Also implicit in the code is that the solution will not work everywhere, which is why optimization depends on what
        you intend on optimizing.

        -felonious --
Re: Optimizing existing Perl code (in practise)
by thoglette (Scribe) on Aug 20, 2002 at 12:29 UTC
    As others have said:
  • Write it and optimise only if it needs it
  • Get your algorithms right first
  • Ninety percent of your code will be fast enough - only certain blocks may need tweeking.

    Case in point - on a recent project with over 1/2Mbyte of script and about 400 'instances' two 'instances' ran far too slowly. Most 'instances' ran in under 10 seconds while these two required 60 minutes, which was unacceptable.
    An analysis (See comments on monitoring) showed that we had the following:

    while(1)
    {
       $thing = new thing;
       $thing->method(getc());
       print $thing->result();
       $thing->DESTROY;
    }
    
    All very well and good, but our class was heavily inherited and new executed no less than 60 lines of code, including multiple function calls. result went all the way up the tree to an AUTOLOAD handler. And all for a 10 line method

    So, about 120 lines of code (and about 20 @INC function calls) to do 10 lines of code.

    Time for some faster, locally optimised code AND VERY LOUD COMMENTS. Both in the local code and in the class which was being 'broken'. Nett result was a run time about 10 seconds. Which was acceptable for this project.
    --
    Butlerian Jihad now!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://191025]
Approved by Ovid
Front-paged by hsmyers
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (14)
As of 2014-04-24 11:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (565 votes), past polls