Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Re^2: Efficient way to sum columns in a file

by sk (Curate)
on Apr 13, 2005 at 18:12 UTC ( #447544=note: print w/replies, xml ) Need Help??

in reply to Re: Efficient way to sum columns in a file
in thread Efficient way to sum columns in a file

Thanks all for your comments! As expected the looping idea is very slow (Per Random_Walk's results) and I guess we are better off "generating" another perl script with many variables based on the number of columns required. This might not look pretty but seems to be the most efficient way to do it.

Also, I was curious to see the impact of cut and Perl's split.

So I tested these two commands/script on 500K file generated using (R's code)...However I output 25 columns instead of 4 and cut out 15 columns for testing

[sk]% time cut -d, -f"1-15" numbers.csv > out.csv 5.670u 0.340s 0:06.27 95.8%

[sk]% time perl -lanF, -e 'print join ",", @F[0..14];' numbers.csv > o +ut.csv 31.950u 0.200s 0:32.26 99.6%

I am surprised that Perl's split is *very* slow when compared to UNIX built in cut. Is this because Perl's split does a lot more than the Unix's cut? I see a lot of use cases for Perl in handling large files but if parsing is a bottle neck then I need to be careful on when to use it.

Thanks again everyone! I enjoyed reading the replies. Esp i liked the explanantions on eof and eof() (very good example to demonstrate the diff) and also the END {} idea :)



Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://447544]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (3)
As of 2018-02-21 21:19 GMT
Find Nodes?
    Voting Booth?
    When it is dark outside I am happiest to see ...

    Results (288 votes). Check out past polls.