in reply to Calculating the average on a timeslice of data

Specify how much "a lot of data" is. Are we talking thousands of lines or millions or billions? Personally I would split on white space, then do something like:

if($line[1]=="0109") { $sum += $line[2]; $n+=1; }

I ran this on a file with 1000000 lines and it took about 3 seconds to run. There were 83333 matches for that particular date.

The complexity is O(n) so it should scale linearly with increased input sizes.

Replies are listed 'Best First'.
Re^2: Calculating the average on a timeslice of data
by perlbrother (Initiate) on Jul 06, 2011 at 15:49 UTC
    There's about 40,000 lines and unique date values. Do you think it makes more sense to read in the values into hashes and use something like %hash{id}{date}?