Re^2: Efficient way to sum columns in a file

Thanks all for your comments! As expected the looping idea is very slow (Per Random_Walk's results) and I guess we are better off "generating" another perl script with many variables based on the number of columns required. This might not look pretty but seems to be the most efficient way to do it.

Also, I was curious to see the impact of cut and Perl's split.

So I tested these two commands/script on 500K file generated using (R's code)...However I output 25 columns instead of 4 and cut out 15 columns for testing

[sk]% time cut -d, -f"1-15" numbers.csv > out.csv
5.670u 0.340s 0:06.27 95.8%
[download]

[sk]% time perl -lanF, -e 'print join ",", @F[0..14];' numbers.csv > o
+ut.csv
31.950u 0.200s 0:32.26 99.6%
[download]

I am surprised that Perl's split is *very* slow when compared to UNIX built in cut. Is this because Perl's split does a lot more than the Unix's cut? I see a lot of use cases for Perl in handling large files but if parsing is a bottle neck then I need to be careful on when to use it.

Thanks again everyone! I enjoyed reading the replies. Esp i liked the explanantions on eof and eof() (very good example to demonstrate the diff) and also the END {} idea :)

cheers

Comment on Re^2: Efficient way to sum columns in a file Select or Download Code


Pathologically Eclectic Rubbish Lister
	PerlMonks