Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re^3: Processing ~1 Trillion records

by Jenda (Abbot)
on Oct 25, 2012 at 12:08 UTC ( #1000838=note: print w/ replies, xml ) Need Help??

in reply to Re^2: Processing ~1 Trillion records
in thread Processing ~1 Trillion records

You seem to be accumulating lots of data in the hashes, are you sure it all fits in memory? As soon as you force the computer to swap memory pages to disk, the processing time grows insanely!

It might help to tie the hashes to a DBM file (DB_File, MLDBM, ...) or use a SQLite or some other database to hold the temporary data. Doing as much work as you can upfront in the Oracle database would most probably be even though. Sometimes a use DB_File;tie %data, 'DB_File', 'filename.db'; is all you need to change something from unacceptably slow to just fine.

Enoch was right!
Enjoy the last years of Rome.

Comment on Re^3: Processing ~1 Trillion records
Download Code
Replies are listed 'Best First'.
Re^4: Processing ~1 Trillion records
by aossama (Acolyte) on Oct 25, 2012 at 12:36 UTC
    Is this like using Redis to store/retrieve the key-value?

      Yes. You can use Redis itself, seems it does have a Perl binding. The whole point is to make sure the process fits in memory and the data that had to be moved to the disk is accessed/updated efficiently.

      Enoch was right!
      Enjoy the last years of Rome.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1000838]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (8)
As of 2015-11-29 22:16 GMT
Find Nodes?
    Voting Booth?

    What would be the most significant thing to happen if a rope (or wire) tied the Earth and the Moon together?

    Results (753 votes), past polls