Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Performance Trap - Opening/Closing Files Inside a Loop

by kvale (Monsignor)
on Dec 10, 2004 at 00:05 UTC ( #413721=note: print w/replies, xml ) Need Help??


in reply to Performance Trap - Opening/Closing Files Inside a Loop

Assmue that there are a reasonably small number of column names. Then just pre-open all possible files before the main loop and write to the appropriate file handle each time through the loop. Caching is your friend :)

-Mark

  • Comment on Re: Performance Trap - Opening/Closing Files Inside a Loop

Replies are listed 'Best First'.
Re^2: Performance Trap - Opening/Closing Files Inside a Loop
by Limbic~Region (Chancellor) on Dec 10, 2004 at 00:24 UTC
    Mark,
    I am likely being thick, but I don't understand. The value of the column (not a column name) is what is being used as the file name. It is not possible to know in advance the values without going through every line of every file first. Even if you did that, you would still need to store the information in a hash so that you could look up the filehandle corresponding to that value later so I see this as a slower variation on my proposed solution. What am I missing?

    Cheers - L~R

      Ah, sorry I wasn't clear. I assumed that one knew the (small) set of possible column values to be used as filenames. If you do not know this set of values, my method may still be faster, but prescanning the table will add some time to the execution.

      Once you have established a hashmap from column values to filehandles, then you can print to the desired filehandle. I expect a single hash lookup to be much faster than a pair of system calls for opening and closing files; in addition to the OS bookkeeping and disk IO overhead for opening and closing, each file buffer is flushed (and, depending on the OS and filesystem, the disk is written to) for every line written.

      Another completely different method is to append the lines to different strings, one for each column value. Then write them all the strings out to files after the loop.

      -Mark

        kvale,
        I expect a single hash lookup to be much faster than a pair of system calls for opening and closing files

        I don't want to sound like I am beating a dead horse here, but that sounds identical to my solution except your way seems like it would be slower because instead of figuring it out as you go, you are processing the files twice.

        Cheers - L~R

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://413721]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (2)
As of 2017-09-24 23:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    During the recent solar eclipse, I:









    Results (275 votes). Check out past polls.

    Notices?