Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Optimizing Output

by Dogma (Pilgrim)
on Apr 15, 2002 at 03:58 UTC ( [id://159088]=perlquestion: print w/replies, xml ) Need Help??

Dogma has asked for the wisdom of the Perl Monks concerning the following question:

It's often hard to find a balance between performance and writing readable code. We all know that it's more efficent to combine multiple "prints" into one statement. Further according to Mastering Algorithms in Perl it is "1.5%" faster to use a "," instead of a "." in combined print statements. But beyond that what else is possible? It is often nessicary to break up data output into different subs or to have logic mixed between statements. So is it more efficent (correct?) to store output in a scalar and then "print" it when your done generating output or to make several prints.

Of course this would be on a much larger scale...

$string .= "someoutput"; $string .= "someotheroutput"; # logic here $string .= "moreoutput"; print $string;
vs.
print "someoutput", "someotheroutput"; # loging here print "moreoutput";
Also how much will performance vary on different platforms and how large a difference will perl's buffering have?

Cheers,
-Dogma

Replies are listed 'Best First'.
Re: Optimizing Output
by dws (Chancellor) on Apr 15, 2002 at 04:57 UTC
    So is it more efficent (correct?) to store output in a scalar and then "print" it when your done generating output or to make several prints.

    Do the simplest thing that works. Then, if there proves to be a performance problem, profile (measure) before deciding what to do. Optimizing before you have data is almost always a waste of time. And once you have the data, you'll often find that the issue is algorithmic.

    I often gather up strings into a scalar for later printing, and though I've often daydreamed up super-efficient way of doing this, the applications always either seem to be fast enough, or the performance issues are solved by doing a more effective query against the database.

Re: Optimizing Output
by Juerd (Abbot) on Apr 15, 2002 at 07:56 UTC

    To optimize output...

    • never believe people who say sys* functions are always faster because they are more direct. syswrite is not faster than print, because print benefits from Perl's internal optimization. (syswrite vs print: print wins (buffered: 580%, unbuffered: 250%))
    • do not turn off buffering (do not turn on autoflush). A lot of people seem to have a habit of writing $|++; in every single script. Most scripts do not need it. Use $| wisely. (print '' unbuffered vs buffered: buffered wins (90%))
    • write large chunks if you are on a slow medium. If, for some reason, you have to write to a file on an operating system that does no buffering, buffer yourself, and write large chunks. For normal scripts, this is not much of a problem and writing directly is probably more efficient than building a chunk.
    • do not try to interpolate function calls as described in How do I expand function calls in a string? (perlfaq4), but if you do, use ${\ ... } instead of @{[ ... ]} - if you need list context, join it yourself (and remember that a constant literal is faster than $").

    - Yes, I reinvent wheels.
    - Spam: Visit eurotraQ.
    

      "If, for some reason, you have to write to a file on an operating system that does no buffering, buffer yourself, and write large chunks."

      This is odd. For this level of output, you're not looking so much at operating system buffering, but at the buffering your run-time environment provides. For Perl, this is given by the "normal" output functions (not sysread and syswrite). For C, this is e.g. the stdio.h functions. See K&R for details on how to implement putc and getc. It all happens in user space, not kernel space.

      The vast majority of benefit from buffering comes from this application library level. In fact, the cited benchmarks say just that: switching off Perl's buffering (with $|=0) turns off this buffering; it doesn't do any hacking on obscure OS parameters. And doing it clobbers performance.


      Juerd below is, of course, correct. You switch buffering off by making your output "piping hot": $|=1.

        switching off Perl's buffering (with $|=0)

        The other way around. $| controls autoflush, the opposite of buffering.

        $| = 1; # Autoflush on, buffering off. $| = 0; # Autoflush off, buffering on. $|++; # $| = 1 $|--; # $| = !$| (flip setting 0/1)

        - Yes, I reinvent wheels.
        - Spam: Visit eurotraQ.
        

Re: Optimizing Output
by stephen (Priest) on Apr 15, 2002 at 05:49 UTC

    I threw together some bizarre benchmarking code to see how much the two methods differ on my machine. The difference was more than I expected... but I pretty much don't believe my results. Code below, once I finish explaining why one should ignore it completely. :)

    The old adage is that "Premature optimization is the root of all evil." Yet another true cliche is that 80% of processing is performed in 20% of all code. Optimizing before profiling the code to find out where it's spending time can waste your life away, and lead you to sacrifice readability and maintainability where it isn't necessary to do so.

    I second what dws said. I would go a step further, perhaps: write the code in the most readable and maintainable way you can. If one is going to be interpolating enough variables and subroutines that this question becomes worth thinking about, then it's time to consider using a templating system like The Template Toolkit or Text::Template. Many templating systems precompile themselves, so that they are nearly the same speed as either method.

    Just for laughs, here's a code snippet that performs a rough-and-ready (read: probably meaningless) comparison of building up a string for appending versus printing a long list:

      This was intended to be a question on optimizing outputing of strings. Not when and what to optimize as that's really a different discussion.

      Anyways here are the results of your benchmark on my laptop with linux-2.4.18/perl 5.6.1...

      build_print: 24 wallclock secs (20.87 usr + 0.08 sys = 20.95 CPU) @ 4 +7.73/s (n=1000) list_print: 20 wallclock secs (17.51 usr + 0.01 sys = 17.52 CPU) @ 57 +.08/s (n=1000) Rate build_print list_print build_print 47.7/s -- -16% list_print 57.1/s 20% --
      I suspect there are serous buffering differences from platform to platform.

      Adding "$|++" to the top of the script seems to widen the difference between methods by 1-2%.

Miscelaneous thoughts
by Fletch (Bishop) on Apr 15, 2002 at 04:07 UTC

    • Get it working first, optimize (or refactor if you want to sound hep :) afterwards.
    • Keep in mind that if you're interpolating variables you've implicitly used the . operator
    • Heredocs (<<EOT) are much more readable (IMHO) than fifteen gazillion .= lines (I mean come on, this is perl not C or Java)

    Update:As was pointed out to me in a msg, strictly speaking refactoring isn't optimization; however refactoring may improve performance by replacing an inefficient implementation with a better designed one. Pardon me for attempting humor. :)

Debug code (was Re: Optimizing Output)
by fuzzycow (Sexton) on Apr 15, 2002 at 14:19 UTC
    The other side of the coin is the usability of your output - If you are writing debugging code for your program - it's (in my opinion) better to write less lines with proper debug/trace information then do 'print "X=$a"' every other line. For that i would recommend using something like 'Log::Agent' module (which btw is great)
Re: Optimizing Output
by riffraff (Pilgrim) on Apr 15, 2002 at 17:29 UTC
    Doesn't it say somewhere that interpolating is faster than multiple .'s? If I remember correctly, each '.' forces a copy, where as interpolation only does it once.

    $string .= $a . $b . $c . $d;

    does like 5 copies, whereas

    $string = "$string$a$b$c$d";

    only does one.

    Because of this, I always do interpolation if I can, but I won't go out of my way to do so.

      No.
      perl -Dt -e '"$a$b".$c.$d' EXECUTING... (-e:0) enter (-e:0) nextstate (-e:1) gvsv(main::a) (-e:1) gvsv(main::b) (-e:1) concat (-e:1) gvsv(main::c) (-e:1) concat (-e:1) gvsv(main::d) (-e:1) concat (-e:1) leave
      They both call the concat opcode.

      At one time, I remember the tokenizer actually rewrote "x$y" to be "x" . $y and "recursed" on it. But it doesn't seem to anymore. O::Deparse can even tell the difference:

      perl -MO=Deparse -e '"$a$b".$c.$d' "$a$b" . $c . $d;
Re: Optimizing Output
by BUU (Prior) on Apr 15, 2002 at 13:07 UTC
    Is there any nice module or script that would run through another script and detirmine what is taking the most time/cpu power?
      Kind of... It's called Benchmark.pm. It won't just read in a script and tell you what's fast and what's slow, but it is very helpful nonetheless.

      Do a search here for 'use Benchmark;' and you'll find lots of examples.

      s!!password!;y?sordid?binger?;y.paw.mrk.; print chr 0x5b;print;print chr(0x5b+0x2);

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://159088]
Approved by belg4mit
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (6)
As of 2024-04-23 09:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found