Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re: cut vs split (suggestions)

by Mabooka-Mabooka (Sexton)
on Apr 17, 2005 at 19:53 UTC ( #448680=note: print w/replies, xml ) Need Help??

in reply to cut vs split (suggestions)

Mabooka % head -1 numbers.csv 462,393,252,996,663,603,344,439,139,259,879,766,545,192,477,986,317,77 +,611,303,79,742,190,556,538 Mabooka % wc -l numbers.csv 500000 numbers.csv Mabooka % Mabooka % time perl -lanF, -e 'print join ",", @F[0..4];' numbers.csv +> f1 27.820u 0.100s 0:27.92 100.0% 0+0k 0+0io 320pf+0w Mabooka % time cut -d, -f"1-5" numbers.csv > f2 1.860u 0.100s 0:01.96 100.0% 0+0k 0+0io 100pf+0w Mabooka % diff f1 f2 Mabooka %
So it's clear what to use (if it's a bottleneck problem rather than an academic disput).

Now, back to the original problem (sum up columns): on my system, for 500,000 it's negligible, so I tried with 5,000,000 x 25 cols:
Mabooka % time perl -nle'my($a,$b,$c,$d,$e)=split /,/;$ta+=$a, $tb+=$b +, $tc+=$c, $td+=$d, $te+=$e;END{print join " ", $ta,$tb,$tc,$td,$te}' + numbers.csv 2499084140 2499188390 2500073650 2497725180 2495867770 45.270u 0.200s 0:45.44 100.0% 0+0k 0+0io 322pf+0w Mabooka % Mabooka % time sum5.cut_n_awk 2499084140 2499188390 2500073650 2497725180 2495867770 18.520u 0.490s 0:12.52 151.8% 0+0k 0+0io 575pf+0w Mabooka %
, where:
Mabooka % cat sum5.cut_n_awk # cat numbers.csv | cut -f1,2,3,4,5 -d, |awk -F, '{s1 += $1; s2 += $2; s +3+= $3; s4+=$4; s5+=$5} END {printf ("%.0f %.0f %.0f %.0f %.0f\n", s +1, s2,s3,s4,s5)}'

3-4 times difference isn't bad. Maybe this would help...

Replies are listed 'Best First'.
Re^2: cut vs split (suggestions)
by merlyn (Sage) on Apr 17, 2005 at 22:15 UTC
      LOL:-). Thanks for the article: haven't seen it.

      Actually, the main purpose of cat for me is not to memorize all keys of grep. E.g:
      Mabooka % grep -w awk *awk* | sort | uniq | wc -l 10 Mabooka % cat *awk* | grep -w awk | sort | uniq | wc -l 8 Mabooka %
      It's also good for the prev. command reuse. As for the performance implication: it's minimal (smb. has to do disk i/o anyway).

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://448680]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (8)
As of 2017-10-19 12:00 GMT
Find Nodes?
    Voting Booth?
    My fridge is mostly full of:

    Results (252 votes). Check out past polls.