Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re^2: Efficient way to sum columns in a file

by sk (Curate)
on Apr 13, 2005 at 18:12 UTC ( #447544=note: print w/replies, xml ) Need Help??

in reply to Re: Efficient way to sum columns in a file
in thread Efficient way to sum columns in a file

Thanks all for your comments! As expected the looping idea is very slow (Per Random_Walk's results) and I guess we are better off "generating" another perl script with many variables based on the number of columns required. This might not look pretty but seems to be the most efficient way to do it.

Also, I was curious to see the impact of cut and Perl's split.

So I tested these two commands/script on 500K file generated using (R's code)...However I output 25 columns instead of 4 and cut out 15 columns for testing

[sk]% time cut -d, -f"1-15" numbers.csv > out.csv 5.670u 0.340s 0:06.27 95.8%

[sk]% time perl -lanF, -e 'print join ",", @F[0..14];' numbers.csv > o +ut.csv 31.950u 0.200s 0:32.26 99.6%

I am surprised that Perl's split is *very* slow when compared to UNIX built in cut. Is this because Perl's split does a lot more than the Unix's cut? I see a lot of use cases for Perl in handling large files but if parsing is a bottle neck then I need to be careful on when to use it.

Thanks again everyone! I enjoyed reading the replies. Esp i liked the explanantions on eof and eof() (very good example to demonstrate the diff) and also the END {} idea :)



Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://447544]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2017-03-30 06:02 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (353 votes). Check out past polls.