Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: Efficient way to sum columns in a file

by sk (Curate)
on Apr 13, 2005 at 18:12 UTC ( #447544=note: print w/ replies, xml ) Need Help??


in reply to Re: Efficient way to sum columns in a file
in thread Efficient way to sum columns in a file

Thanks all for your comments! As expected the looping idea is very slow (Per Random_Walk's results) and I guess we are better off "generating" another perl script with many variables based on the number of columns required. This might not look pretty but seems to be the most efficient way to do it.

Also, I was curious to see the impact of cut and Perl's split.

So I tested these two commands/script on 500K file generated using (R's code)...However I output 25 columns instead of 4 and cut out 15 columns for testing

[sk]% time cut -d, -f"1-15" numbers.csv > out.csv 5.670u 0.340s 0:06.27 95.8%

[sk]% time perl -lanF, -e 'print join ",", @F[0..14];' numbers.csv > o +ut.csv 31.950u 0.200s 0:32.26 99.6%

I am surprised that Perl's split is *very* slow when compared to UNIX built in cut. Is this because Perl's split does a lot more than the Unix's cut? I see a lot of use cases for Perl in handling large files but if parsing is a bottle neck then I need to be careful on when to use it.

Thanks again everyone! I enjoyed reading the replies. Esp i liked the explanantions on eof and eof() (very good example to demonstrate the diff) and also the END {} idea :)

cheers

SK


Comment on Re^2: Efficient way to sum columns in a file
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://447544]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2014-09-02 05:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (20 votes), past polls