Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Optimizing the bejeezus out of a sub

by sgifford (Prior)
on Jun 24, 2003 at 07:01 UTC ( #268405=perlquestion: print w/ replies, xml ) Need Help??
sgifford has asked for the wisdom of the Perl Monks concerning the following question:

I'm looking for some places I can squeeze a little bit more performance out of a sub. dprof says this sub is running 50% of the time for my application, so it looks like my best candidate for optimization.
# Print the given template sections to the supplied filehandle sub printto { my $self = shift; my $fh = shift; my $ret = 0; my($val,$sec,$v); for(;@_;shift) { $sec = $self->[_sec]{$_[0]}; if (!defined($sec)) { print $fh $self->_nosuchsec($_[0]); next; } $ret++; foreach $v (@$sec) { if ($v->[_type] == type_text) { print $fh $v->[_contents]; } else { $val = $self->[_assign]{$v->[_contents]}; print $fh defined($val) ? $val : $self->_nosuchvar($val,"\$".$val); } } } $ret; }
This is the output routine for a templating engine. It takes as parameters an object, a filehandle, and zero or more template section names to display. For each template section name, if the section exists it displays all parts of it. A part is either plain text, or else a variable; variables are looked up in the $self->[_assign] hashref. The _nosuchsec and _nosuchvar routines are error handlers, and are never called in normal operation.

Here's what DProf says:

%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 51.9   21.53 21.280 130000   0.0002 0.0002  FTS::printto

Any ideas where to look for that last little bit of performance?

Thanks!

Comment on Optimizing the bejeezus out of a sub
Select or Download Code
Re: Optimizing the bejeezus out of a sub
by dws (Chancellor) on Jun 24, 2003 at 07:15 UTC
    Any ideas where to look for that last little bit of performance?

    The first thing that jumped out when I looked over your code is the loop. It looks like something transcribed from C. Instead of

    for(;@_;shift) { ... ... @_[0] ...
    try doing something like
    foreach ( @_ ) { ... ... $_ ...
    This plays to Perl's strengths (and saves you a few array subscripts). Or, if you want to be explicit,
    foreach my $section ( @_ ) { ...
      My goal was to speed things up by avoiding copying each value from @_ to $section. While foreach is certainly more aesthetic, some benchmarking shows it to be about the same speed as my loop.

      I'm a little surprised by this, actually; I expected using @_ directly to always be faster, since it would avoid copying values.

        foreach my $section ( @_ ) { ... } doesn't copy value from @_ to $section. In the loop the variable $section is merely an alias to a currently selected element in @_ array. You can even modify @_ array by assigning a new value to $section. Example:

        # adds 1 to all elements of array foreach my $elem (@arr) { $elem ++; }

        --
        Ilya Martynov, ilya@iponweb.net
        CTO IPonWEB (UK) Ltd
        Quality Perl Programming and Unix Support UK managed @ offshore prices - http://www.iponweb.net
        Personal website - http://martynov.org

      I wrote about the C-style for() in Re: Re: Believably slow... It is slower than the foreach() style for. Not significantly but slightly. I could only remove a few operations from your code, I noted each spot with an ALTER comment. I think your time is consumed by those print() statement - IO is always expensive.

      ## Print the given template sections to the supplied filehandle sub printto { my $self = shift; my $fh = shift; my $ret = 0; my($val,$sec,$v); # ALTER - foreach over C-style for foreach my $sec_k (@_) { $sec = $self->[_sec]{$sec_k}; # ALTER - drop unnecessary defined() unless ( $sec ) { # ALTER - remove indirect object syntax $fh->print( $self->_nosuchsec($sec_k) ); next; } $ret++; foreach $v (@$sec) { if ($v->[_type] == type_text) { # ALTER - remove indirect object syntax $fh->print( $v->[_contents] ); } else { $val = $self->[_assign]{$v->[_contents]}; $fh->print( defined $val ? $val : $self->_nosuchvar($val,"\$".$val) ); } } } return $ret; }

        Thanks for the suggestions! Here's what I found...

        I already changed the loops to use foreach; that seemed to be a little faster, and a lot more readable, so all benchmarks already have that in it. Here's what the numbers looked like before the other two changes:

        Total Elapsed Time = 43.18577 Seconds
          User+System Time = 42.69577 Seconds
        Exclusive Times
        %Time ExclSec CumulS #Calls sec/call Csec/c  Name
         48.8   20.83 20.710 130000   0.0002 0.0002  FTS::printto
        

        Changing if(!defined($sec)) to unless($sec) seemed to slow printto down just a hair. This could just be measurement error; for some reason apparently other parts of the code sped up with this change, which again points to measurement error...

        Total Elapsed Time = 41.62356 Seconds
          User+System Time = 41.09356 Seconds
        Exclusive Times
        %Time ExclSec CumulS #Calls sec/call Csec/c  Name
         51.3   21.09 20.905 130000   0.0002 0.0002  FTS::printto
        

        Changing print $fh to $fh->print seemed to slow things down significantly. The IO::Handle::print changes who the print calls get charged to, so the ExclSec for printto goes down, but it's made up for by all the calls to IO::Handle::print, and the CumulS and overall time go way up...

        Total Elapsed Time = 51.91479 Seconds
          User+System Time = 51.32479 Seconds
        Exclusive Times
        %Time ExclSec CumulS #Calls sec/call Csec/c  Name
         41.5   21.31 18.510 281000   0.0000 0.0000  IO::Handle::print
         38.9   19.97 35.589 130000   0.0002 0.0003  FTS::printto
        
Re: Optimizing the bejeezus out of a sub
by ant9000 (Monk) on Jun 24, 2003 at 07:16 UTC
    It's already barebone as it is, as far as I can tell. The only idea that comes to my mind is to delay print calls by building up a message string, and print it just once at the sub end. Depending on the amount of memory available (and output expected) it might get you a tiny little bit of speed increase, if your $fh is opened on disk.
      Thanks, that did speed it up a little bit!

      Before:

      %Time ExclSec CumulS #Calls sec/call Csec/c  Name
       49.6   21.58 21.525 130000   0.0002 0.0002  FTS::printto
      
      After:
      %Time ExclSec CumulS #Calls sec/call Csec/c  Name
       48.8   20.83 20.710 130000   0.0002 0.0002  FTS::printto
      

        Does anybody know why this seems to make a difference? The print calls are of fairly short strings, and the filehandle I'm printing to is buffered (strace shows that each iteration is generating only one system call). I thought that printing a small string to a buffered filehandle would just append the string to the stdio buffer, which would be as fast as or faster than me buffering myself.

Re: Optimizing the bejeezus out of a sub
by hossman (Prior) on Jun 24, 2003 at 07:35 UTC
    dprof says this sub is running 50% of the time for my application, so it looks like my best candidate for optimization.

    Well, your application has to spent 50% of it's time somewhere.

    If you've got 50 methods, and you're making 50 calls to each, and only one of those methods is 50% of your execution time, then you've got something to look into. But if you program consists of nothing more then some initialization, some config file parsing, a quick select from a DB, and then a single call to this method ... 50% is kicking some ass.

    Numbers are meaningless without units, and Percentages are meaningless without context.

      My point is that if I'm going to optimize anything, this is the sub to spend my time on. The other two candidates I've already tweaked, and even if I doubled their performance I'd still get only a modest gain. A clever insight on this sub could make a much bigger difference.

      %Time ExclSec CumulS #Calls sec/call Csec/c  Name
       48.8   20.83 20.710 130000   0.0002 0.0002  FTS::printto
       17.5   7.499  7.490  10000   0.0007 0.0007  FTS::addstring
       15.3   6.569  6.450 120000   0.0001 0.0001  FTS::assign
       6.77   2.892 42.220  10000   0.0003 0.0042  main::__ANON__
       2.55   1.090 43.310  10002   0.0001 0.0043  Benchmark::__ANON__
       2.44   1.040  1.030  10000   0.0001 0.0001  main::cleverlinks
       2.20   0.940  8.410  10000   0.0001 0.0008  FTS::parsehandle
       1.45   0.620  0.610  10000   0.0001 0.0001  IO::File::open
       1.38   0.590  1.650  10000   0.0001 0.0002  IO::File::new
       1.38   0.590 10.620  10000   0.0001 0.0011  FTS::parsefile
       1.03   0.440 11.410  10000   0.0000 0.0011  FTS::new_fromfile
       0.94   0.400  0.390  10000   0.0000 0.0000  FTS::new
       0.77   0.330  0.320  10000   0.0000 0.0000  Symbol::gensym
       0.42   0.180  0.480  10000   0.0000 0.0000  IO::Handle::new
       0.07   0.030  0.010  20000   0.0000 0.0000  IO::Handle::DESTROY
      
        Here's another thought: how about not calling your printto() and assign() methods quite so often?

        There's overhead in calling the subs in the first place. Doubly so for method calls. Cut down on the number of calls.

        Examine your algorithms to determine why it's necessary to call FTS::printto()/assign() an order of magnitude more times than anything else in your program. Can the work be put off, done in batches? Can you leave yourself helpful references in the data so that less looping/searching is necessary? (Fewer arrays, more hashes). Is there an opportunity to memoize functions somewhere? Can you prepare the data better during FTS::parsefile (or new_fromfile or something) so that it's more easily dealt with later on?

        Beware of premature optimization

Re: Optimizing the bejeezus out of a sub
by rdfield (Priest) on Jun 24, 2003 at 12:46 UTC
    I remember BrowserUk doing some performance work on parameter passing, and the conclusion (IIRC) was to refer to the parameter array directly, rather than shifting.

    rdfield

Re: Optimizing the bejeezus out of a sub
by particle (Vicar) on Jun 24, 2003 at 12:55 UTC

    once you're done with the suggestions above, don't forget to add sub bejeezus () {} to get that bejeezus right out of there!

    ~Particle *accelerates*

Re: Optimizing the bejeezus out of a sub
by BrowserUk (Pope) on Jun 24, 2003 at 13:10 UTC

    The only thing left inside the routine that might be trimable is the inner foreach loop. If the list of aliases generated by @$sec is of any size, then you maybe paying a time penalty for building that list, and might realise some saving by using indexes to iterate over the array

    for (0..$#{$sec}) { $sec->[$_][_type]... }

    Also, if _type,  _sec & _contents aren't already defined as constants (which are usually UPPERCASE by convention) then making them constants may yield another minor improvement.

    However, given that each call to the routine is only taking an average of 0.000165 seconds, there isn't exactly a lot of fat to be trimed within the routine. The fact that you are spending 50% of your programs time, a little over 20 seconds in there comes down to the fact that you are calling it 130,000 times.

    There are a few possibilities that might improve things.

    • Currently, even with the improvement of building your output string up within the routing and printing it once at the end of each call, you are still calling the built-in print 130,000 times. You might save a little by accumulating the output from all 130,000 calls in a string using something like

       $self->[_output] .= ... and then calling print once at the end of the (external) loop that is calling this 130,000 times.

    • The other, and more likely candidate for cutting down your overall execution time is to reduce the number of times you call the routine. Whether that is possible depends on the nature of that external loop.

    If there is no possibility of reducing the 130,000 calls to the routine, then you might save some by not calling the routine! That is to say, if you inlined the sub at the point(s) of call, you would save the cost of building the stackframe and calling it.

    Whether this is

    • possible, given the architecture of the application.
    • desirable, given the probable redundancy that inlining might create and all the caveats that go with it

    Are both questions you must decide given your requirements. How much benefit is realisable will depend on the nature of the code that calls this routine.

    As an attempt to give some indication, the following (unrepresentative and possibly misrepresentative) benchmark shows that under some circumstances, inlining and/or deferring the printing, can yield some possibly significant benefits.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


      The listification of @_ was optimized away by perl so foreach accessed the array directly. Using offsets to counteract the not-actually-happening maloptimization is a maloptimization in and of itself.

        I said might!

        Can you substantiate that the listification is always optimised away?


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Re: Optimizing the bejeezus out of a sub
by xdg (Monsignor) on Jun 24, 2003 at 14:38 UTC

    The one thing that jumps out at me quickly is that the object-structure has you doing a lot of dereferencing, array lookups, and hash lookups to get to your actual content. You might consider if there are places you can benefit from creating temporary references deeper into your strucure before loop and avoid dereferencing through $self->[xxxx] each time through.

    xdg

Re: Optimizing the bejeezus out of a sub
by tall_man (Parson) on Jun 24, 2003 at 15:42 UTC
    Is this really what you want to do? If $val is defined, print it. Otherwise, pass undef and a string concatenated with undef to a subroutine. I suspect your code has never branched to that case, since it is almost certainly a bug.
    print $fh defined($val) ? $val : $self->_nosuchvar($val,"\$".$val);

      No, you're right; I introduced that error while optimizing. It should be:

      print $fh defined($val) ? $val : $self->_nosuchvar($v->[_contents], "\$".$v->[_contents]);
      Thanks!
Re: Optimizing the bejeezus out of a sub
by tilly (Archbishop) on Jun 24, 2003 at 17:54 UTC
    A number of comments, most of which shouldn't affect performance, but a couple of which might.
    1. Why are you predeclaring $val, $sec, and $v? Move the declaration inline. I doubt that there is any performance difference but it keeps someone (like me) from wondering whether, for instance, $v is global.
    2. Try assigning @_ to $ret, then decrement $ret on skipped sections. Theoretically faster, though probably imperceptibly so.
    3. You could select $fh temporarily rather than printing to it each time.
    4. Why are you writing your own templating engine? I would be inclined to use one of the many ones out there already, several of which have been optimized far more than I would feel like doing it.

      1: I did this to see if it helped performance. It didn't, but I didn't change it back before posting.

      2-3: I'll try these and see if they do anything interesting.

      4: Yeah, I know. I've got a seriously hacked up copy of CGI::FastTemplate that I've been using for years, so it has a style that I've gotten used to and, more importantly, that the Web page designers I work with have gotten used to. It's also a PITA to maintain, so I rewrote it to be more legible (and I hoped faster, but it turns out not). I'm trying to decide if I should keep my hacked up copy of CGI::FastTemplate, use my rewritten replacement, or just use another engine. That got me started benchmarking different templating engines, which led to optimizing this sub. Maybe when I've done a little more research I'll Seek some Wisdom on the merits of the different templating engines...

      2: Didn't really help (a little slower, but probably just measuring error)

      Before:

      %Time ExclSec CumulS #Calls sec/call Csec/c  Name
       48.8   20.83 20.710 130000   0.0002 0.0002  FTS::printto
      

      After 2

      %Time ExclSec CumulS #Calls sec/call Csec/c  Name
       51.2   23.74 23.620 130000   0.0002 0.0002  FTS::printto
      

      3: Ditto.

      After 3

      %Time ExclSec CumulS #Calls sec/call Csec/c  Name
       49.5   21.00 20.880 130000   0.0002 0.0002  FTS::printto
      
Re: Optimizing the bejeezus out of a sub
by Elian (Parson) on Jun 24, 2003 at 18:59 UTC
    I dunno--you're clocking in at 200 microseconds an invocation. That's really not bad...

    It's not going to make a huge difference, but you might want to make sure that autoflush is off for the filehandle in $fh. It won't save much, but everything counts I suppose.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://268405]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (9)
As of 2014-07-25 08:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (170 votes), past polls