Beefy Boxes and Bandwidth Generously Provided by pair Networks RobOMonk
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: How to save memory, parsing a big file.

by graff (Chancellor)
on Mar 01, 2006 at 22:02 UTC ( [id://533796]=note: print w/replies, xml ) Need Help??

This is an archived low-energy page for bots and other anonmyous visitors. Please sign up if you are a human and want to interact.


in reply to How to save memory, parsing a big file.

Definitely go with a DBM approach as described above, to move the hash structure to disk. Apart from that, I'm wondering why you use two different hashes with identical keys (%total and %connects), and why you test a condition that would obviously never be false (if $total{foobar}{from} exists, there's no point testing whether $total{foobar}{to} doesn't exist, since "from" and "to" are both assigned at the same time).

I think the following would be equivalent to the OP code in terms of what it does, but might take less memory and might run a bit faster:

while (<LOG>) { my ($source, $sport, $to, $dport, $proto, $packs, $bytes) = split; my $key = "$source$dport"; if ( exists( $total{$key}{from} )) { $total{$key}{connects}++; $total{$key}{bytes} += $bytes; } else { $total{$key} = { from => $source, to => "$to:$dport", bytes => $bytes, }; # maybe should set 'connects => 1' as well? } $total += $bytes; }

Here are a few (potentially meaningless) benchmarks about the trade-off between more top-level (simple, flat) hashes vs. a single top-level hash with more sub-hash keys (I put a "sleep" in there so I could study the memory/time consumption once the hashes were filled):

perl -e '$k="aaaaa"; for $i (1..1_000_000) { $h1{$k}={foo=>"bar",bar=>"foo",iter=>$i}; $h1{$k}{total}++; $k++} sleep 20' ## consumes 344 MB in ~14.4 sec perl -e '$k="aaaaa"; for $i (1..1_000_000) {$h1{$k}={foo=>"bar",bar=>"foo",iter=>$i}; $h2{$k}++; $k++} sleep 20' ## consumes 352 MB in ~15.0 sec perl -e '$k="aaaaa"; for $i (1..1_000_000) {$h1{$k}={foo=>"bar",bar=>"foo"}; $h2{$k}++; $h3{$k}=$i; $k++} sleep 20' ## consumes 360 MB in ~16.5 sec
So given that you are using one HoH already, there's a slight advantage in not creating a second (or third) hash with the same set of primary keys -- better to add another key to the sub-hash instead.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://533796]
help
Sections?
Information?
Find Nodes?
Leftovers?
    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.