Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Calculating the average on a timeslice of data

by zek152 (Pilgrim)
on Jul 06, 2011 at 15:40 UTC ( [id://913007]=note: print w/replies, xml ) Need Help??


in reply to Calculating the average on a timeslice of data

Specify how much "a lot of data" is. Are we talking thousands of lines or millions or billions? Personally I would split on white space, then do something like:

if($line[1]=="0109") { $sum += $line[2]; $n+=1; }

I ran this on a file with 1000000 lines and it took about 3 seconds to run. There were 83333 matches for that particular date.

The complexity is O(n) so it should scale linearly with increased input sizes.

Replies are listed 'Best First'.
Re^2: Calculating the average on a timeslice of data
by perlbrother (Initiate) on Jul 06, 2011 at 15:49 UTC
    There's about 40,000 lines and unique date values. Do you think it makes more sense to read in the values into hashes and use something like %hash{id}{date}?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://913007]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2026-01-15 09:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What's your view on AI coding assistants?





    Results (118 votes). Check out past polls.

    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.