http://www.perlmonks.org?node_id=447353


in reply to Efficient way to sum columns in a file

I generated 500,000 lines of random CSV with this script

#!/usr/bin/perl use strict; use warnings; # create source numbers if they don't exist my $many = 500000; my $source='numbers.csv'; open CSV, '>', $source or die "can't write to $source: $!\n"; for (1..$many) { my @numbers; push @numbers, (int rand 1000) for (1..5); print CSV join ",",@numbers; print CSV $/; }

Then I tried a few one liners to sum the columns, I ran each twice and post the second timing to allow for cache

nph>time cat numbers.csv | perl -nle'@d=split /,/;$a[$_]+=$d[$_] for ( +0..4);END{print join "\t", @a}' 249959157 249671314 249649377 250057435 249420 +634 real 0m17.10s user 0m15.46s sys 0m0.08s nph>time perl -nle'my @d=split /,/;$a[$_]+=$d[$_] for (0..4);END{print + join "\t", @a}' numbers.csv 249959157 249671314 249649377 250057435 249420 +634 real 0m13.71s user 0m12.77s sys 0m0.04s nph>time perl -nle'my($a,$b,$c,$d,$e)=split /,/;$ta+=$a, $tb+=$b, $tc+ +=$c, $td+=$d, $te+=$e;END{print join "\t", $ta,$tb,$tc,$td,$te}' numb +ers.csv 249959157 249671314 249649377 250057435 249420 +634 real 0m6.45s user 0m5.91s sys 0m0.07s

The last one was consistently faster after several attempts with it and the second.

Cheers,
R.

Pereant, qui ante nos nostra dixerunt!