Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
Nice idea -- it's something I've had to do before. I would recommed two changes.

1. Instead of specifying the files on the command line, use the Unix "filter" paradigm where you read in a file (either from file or STDIN) and write it out to STDOUT. That way the user could do something like (depending on their shell): cols2lines.pl bigfile > bigfile2 or something like:
for file in *.mat; do echo $file; ./cols2lines.pl $file > $file.2; done in order to process a bunch of files.

2. Don't open and close the file so many times! Use seek instead. It's probably faster. For a file with many cols, you will open and close the file a lot -- that takes up time. I did a quick benchmark and on my system here are the results from reading a large file hundreds of times:

Benchmark: timing 100 iterations of openclose, seek... openclose: 186 wallclock secs (161.45 usr + 19.77 sys = 181.22 CPU) @ + 0.55/s (n=100) seek: 17 wallclock secs (16.08 usr + 1.02 sys = 17.10 CPU) @ 5.85/s +(n=100)
Because your program is doing a lot of I/O and other things (like pushing stuff onto big arrays) not all your time is spent opening and closing files so the speedup won't be as dramatic as the simple benchmark but it will be faster. I've made a small change (changed 3 lines) to your program to use seek instead of repeated open/close. Using the modified code on a file with 1000 columns, it ran about 25% faster than yours (a significant improvement if the file is really big).
Here's your sub bigfiles_colstolines modified to use seek:
sub bigfile_colstolines { my $infile = shift; my $outfile = shift; my $infilehandle = "<$infile"; # read-only open (INFILE, $infilehandle) or die ("File error.\a\n"); my $outfilehandle = ">$outfile"; # write only open (OUTFILE, $outfilehandle) or die ("Output failure.\a\n"); my $line = <INFILE>; my @testarray = split (/$delimiter/, $line); close (INFILE); open (INFILE, $infilehandle) or die ("File error.\a\n"); for (my $counter=0; $counter <= $#testarray; $counter++){ my @columnarray = undef(); while (defined ($line = <INFILE>)){ chomp ($line); my @linearray = split (/$delimiter/, $line); push (@columnarray, $linearray [$counter]); } shift (@columnarray); # removes unwanted characters my $newline = join $delimiter, (@columnarray); print OUTFILE ($newline, "\n"); #rewind the file seek(INFILE,0,0); } close(INFILE); close(OUTFILE); }

In reply to Re: cols2lines.pl by RhetTbull
in thread cols2lines.pl by tfrayner

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others chilling in the Monastery: (8)
    As of 2014-09-17 15:55 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (90 votes), past polls