Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re^2: Performance Trap - Opening/Closing Files Inside a Loop

by tmoertel (Chaplain)
on Dec 10, 2004 at 07:04 UTC ( #413774=note: print w/replies, xml ) Need Help??

in reply to Re: Performance Trap - Opening/Closing Files Inside a Loop
in thread Performance Trap - Opening/Closing Files Inside a Loop

tachyon, your code is likely to be faster not so much because it shaves away Perl cycles but because it will greatly reduce disk seeks, which are probably dominating L~R's run time. (See my other post in this thread for more on this.)

L~R: Assuming that you have the RAM, can you compare tachyon's code's run time to the other implementations? My guess is that tachyon's code will fare well. (If you don't have the RAM, just tweak the code so that it will process, say, 100_000 or so lines per pass and clear out %fh between passes. Also, you'll need to open files in append mode.)


  • Comment on Re^2: Performance Trap - Opening/Closing Files Inside a Loop

Replies are listed 'Best First'.
Re^3: Performance Trap - Opening/Closing Files Inside a Loop
by tachyon (Chancellor) on Dec 10, 2004 at 08:26 UTC

    I agree reducing the number of seeks you need is vital. Given an average 3 msec seek time you can only have 333 seeks per second. This is of course glacial. Ignoring buffering the original code effectively needed 2 seeks (or more) per line, the improved version required at least 1 seek. In the example I presented the number of seeks required is a function of the number of files we need to create, not the number of lines in the input file. This will be a significant improvement provided that the number of unique files is less than the number of input lines.



Re^3: Performance Trap - Opening/Closing Files Inside a Loop
by Limbic~Region (Chancellor) on Dec 10, 2004 at 15:38 UTC
    I had thought about this myself after posting. The reason I didn't give it a lot of initial thought is because the Java developer made it clear that I was not welcome in the sandbox. My guess is that some sort of limited buffer would be best since that's still a whole lot of lines to be keeping all in memory.

    Cheers - L~R

      If that is the case - combine the two methods.
      1. Buffer the strings up to say 10k or more.
      2. Once they hit that size - look for an open, or cache a new filehandle.
      3. Print out the buffered string for the file and clear the buffer.
      4. Finish off by flushing remaining buffers.
      my @a=qw(random brilliant braindead); print $a[rand(@a)];

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://413774]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (8)
As of 2018-07-16 14:58 GMT
Find Nodes?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

    Results (342 votes). Check out past polls.