Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Processing ~1 Trillion records

by mpeppler (Vicar)
on Oct 26, 2012 at 06:43 UTC ( #1001007=note: print w/ replies, xml ) Need Help??


in reply to Processing ~1 Trillion records

I've only glanced at the various answers quickly, so maybe I'm off the mark, but:

My immediate reaction to needing to process that many rows is to try to parallelize the process. It will put a higher load on the DB, but that's what the DB is really good at. Obviously your dataset needs to be partitionable, but I can't imagine a dataset of that size that can't be split in some way.

Michael


Comment on Re: Processing ~1 Trillion records
Re^2: Processing ~1 Trillion records
by Anonymous Monk on Oct 26, 2012 at 12:47 UTC
    Also you need to be sure that it's able to produce results continuously over all those many days. If the program as-writ dies fifteen minutes before starting to write its first file (all data spewing out of RAM only at that point) the entire length of time is waste. No good.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1001007]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (11)
As of 2014-10-21 06:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (97 votes), past polls