Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Re: Processing ~1 Trillion records

by mpeppler (Vicar)
on Oct 26, 2012 at 06:43 UTC ( #1001007=note: print w/replies, xml ) Need Help??

in reply to Processing ~1 Trillion records

I've only glanced at the various answers quickly, so maybe I'm off the mark, but:

My immediate reaction to needing to process that many rows is to try to parallelize the process. It will put a higher load on the DB, but that's what the DB is really good at. Obviously your dataset needs to be partitionable, but I can't imagine a dataset of that size that can't be split in some way.


Replies are listed 'Best First'.
Re^2: Processing ~1 Trillion records
by Anonymous Monk on Oct 26, 2012 at 12:47 UTC
    Also you need to be sure that it's able to produce results continuously over all those many days. If the program as-writ dies fifteen minutes before starting to write its first file (all data spewing out of RAM only at that point) the entire length of time is waste. No good.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1001007]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (10)
As of 2016-10-21 21:01 GMT
Find Nodes?
    Voting Booth?
    How many different varieties (color, size, etc) of socks do you have in your sock drawer?

    Results (290 votes). Check out past polls.