Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Re^2: Efficient way to sum columns in a file

by sk (Curate)
on Apr 13, 2005 at 18:12 UTC ( #447544=note: print w/ replies, xml ) Need Help??

in reply to Re: Efficient way to sum columns in a file
in thread Efficient way to sum columns in a file

Thanks all for your comments! As expected the looping idea is very slow (Per Random_Walk's results) and I guess we are better off "generating" another perl script with many variables based on the number of columns required. This might not look pretty but seems to be the most efficient way to do it.

Also, I was curious to see the impact of cut and Perl's split.

So I tested these two commands/script on 500K file generated using (R's code)...However I output 25 columns instead of 4 and cut out 15 columns for testing

[sk]% time cut -d, -f"1-15" numbers.csv > out.csv 5.670u 0.340s 0:06.27 95.8%

[sk]% time perl -lanF, -e 'print join ",", @F[0..14];' numbers.csv > o +ut.csv 31.950u 0.200s 0:32.26 99.6%

I am surprised that Perl's split is *very* slow when compared to UNIX built in cut. Is this because Perl's split does a lot more than the Unix's cut? I see a lot of use cases for Perl in handling large files but if parsing is a bottle neck then I need to be careful on when to use it.

Thanks again everyone! I enjoyed reading the replies. Esp i liked the explanantions on eof and eof() (very good example to demonstrate the diff) and also the END {} idea :)



Comment on Re^2: Efficient way to sum columns in a file
Select or Download Code

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://447544]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2016-02-11 11:52 GMT
Find Nodes?
    Voting Booth?

    How many photographs, souvenirs, artworks, trophies or other decorative objects are displayed in your home?

    Results (368 votes), past polls