in reply to Performance Trap - Opening/Closing Files Inside a Loop

is there more run-time efficient way than my second suggestion in Perl?

Given the original statement of the problem, with or without the (rather disingenuous) extension to the original problem, I would have suggested that it would help matters noticeably if the input were sorted with respect to the column containing the file name.

The sorting would be really easy to do, either prior to passing the data to perl, or within the perl script (though there might be memory issues doing it in the script, if we're talking about millions of lines instead of dozens). I hope the esteemed java programmer knows about the unix "sort" command (and the fact that it's ported to windows)...

  • Comment on Re: Performance Trap - Opening/Closing Files Inside a Loop

Replies are listed 'Best First'.
Re^2: Performance Trap - Opening/Closing Files Inside a Loop
by Limbic~Region (Chancellor) on Dec 10, 2004 at 03:36 UTC
    graff,
    Presumably, the files need to be appended in the order encountered. Part of the long story unmentioned is a lot of guarded responses to my inquiries for additional information. A cut | sort | uniq might not be a bad idea to pre-process the file to get a list of unique file names though.

    Cheers - L~R

      I wonder if all java programmers are this cagey/evasive about describing their problem sets...

      Even so, now we're just talking about a two-stage sort:

      ## let's suppose the file names are in column 3 of "table.txt": perl -pe 's/^/$.,/' table.txt | sort -t , -k 4,4 -k 1,1n | cut -f2- -d +, | splitter.pl
      where "splitter.pl" is a version of your suggested script that assumes lines are pre-sorted by output file name -- so it really needs only one output file handle open at any one time. By pre-pending the original line numbers before sorting, and using the line numbers as a secondary sort field, the (presumably) intended result is achieved.

      (update: if the original table has file names in column 3, and a perl script prepends a line number to each line, then the primary sort column has to be 4, not 3.)

        graff,
        I wonder if all java programmers are this cagey/evasive about describing their problem sets

        I doubt it. There are people in all trades and positions that like to keep what they do a secret in fear that if people know then their importance and value might be diminished. Since my job title has nothing to do with programming, I am assuming solving such a "simple" task with an "inferior" language would be a huge eyesore. Anyway - it is the Java Developer's problem - not mine. I just wanted to present the problem here so that if there was an superior approach that I was missing I could learn it.

        Cheers - L~R