Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: write hash to disk after memory limit

by LanX (Bishop)
on Mar 13, 2015 at 12:51 UTC ( #1119950=note: print w/replies, xml ) Need Help??


in reply to write hash to disk after memory limit

I'm not sure if I understand your question completely...

... but I once had a problem with a giant hash constantly swapping and solved it by splitting up the hash into a two tier HoH.

If you can organize the upper tier roughly according to the timeline of your process, your system will only swap the necessary lower hashes on demand.

I already described this here, will update the link after I found it.

HTH! :)

update

see Re: Small Hash a Gateway to Large Hash?

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)

PS: Je suis Charlie!

  • Comment on Re: write hash to disk after memory limit

Replies are listed 'Best First'.
Re^2: write hash to disk after memory limit
by hailholyghost (Novice) on Mar 13, 2015 at 13:30 UTC
    thanks a lot, I've been using hash-hash-array-array in order to keep memory use down. I think array access is also faster than hash, so I did this:
    foreach my $rat (@directories) { print "Reading Merged_99$rat/bs_seeker-CG.tab ...\n"; open(FH,"<Merged_99$rat/bs_seeker-CG.tab") or die "cannot read M +erged_99$rat/bs_seeker-CG.tab: $!"; while (<FH>) { if (/M/) { next; } elsif ((/^chr(\S+)\s+(\d+)\s+\d+\s+(\d)\.(\d+)\s+(\d+)/) && + ($1 ~~ @CHROMOSOMES) && ($5 >= $MINIMUM_COVERAGE)) { #chromosome $1, methylated C $2, percent $3.$4 and coverage $5 $DATA{$1}{$2}[$set][$replicate] = "$3.$4"; } elsif ((/^chr(\S+)\s+(\d+)\s+\d+\s+(\d)\s+(\d+)/) && ($1 ~~ + @CHROMOSOMES) && ($4 >= $MINIMUM_COVERAGE)) { $DATA{$1}{$2}[$set][$replicate] = $3; } } close FH; $replicate++; }
      As I said, better

      > > organize the upper tier roughly according to the timeline of your process

      No idea where $set comes from but $replicate could be such a top tier.

      so $data[$set][$replicate]{$1}{$2} should have far less memory swapping problems (AFAIS).

      (BTW better reserve uppercase var-names to perl buit-ins)

      If this structure doesn't fit into your future plans, you most likely want to use a DB anyway.

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)

      PS: Je suis Charlie!

      Do you later use the value as a string or as a number? If you use it as a number, I believe you could save quite a bit of memory by forcing a conversion before storing the data. The way you do it, you end up with a scalar containing both the string and (as soon as you use the number for the first time) the number.

      ... $DATA{$1}{$2}[$set][$replicate] = 0 + "$3.$4"; } elsif ((/^chr(\S+)\s+(\d+)\s+\d+\s+(\d)\s+(\d+)/) && ($1 ~~ + @CHROMOSOMES) && ($ +4 >= $MINIMUM_COVERAGE)) { $DATA{$1}{$2}[$set][$replicate] = 0 + $3; ...

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1119950]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (9)
As of 2017-09-26 20:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    During the recent solar eclipse, I:









    Results (297 votes). Check out past polls.

    Notices?