The following code produces identical results to choroba's code but uses less than 1/4 of the memory (180MB vs 795MB for my test dataset) and runs more quickly:
#! perl -slw
use strict;
use List::Util qw[ first ];
my @headers = split ' ', scalar <>;
my $f = first { $headers[$_] eq 'Strand' } 0 .. $#headers;
my( $cCounts, $wCounts, $n, %index ) = ( '', '', 0 );
while( <> ) {
chomp;
my @F = split ' ';
my $index = $index{ $F[ $f+1 ] }{ $F[ $f + 2 ] } //= $n++;
++vec( $F[ $f ] eq 'w' ? $wCounts : $cCounts, $index, 8 );
}
while( my( $key, $subhash ) = each %index ) {
while( my( $subkey, $index ) = each %{ $subhash } ) {
print join "\t", $key, $subkey, vec( $cCounts, $index, 8 ), ve
+c( $wCounts, $index, 8 );
}
}
__END__
1177246.pl 1177246.dat > 1177246.out
It assumes no count will be greater than 256. If that's too small, change the three 8s to 16s for a small increase in memory consumption.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
|