Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Processing ~1 Trillion records

by mpeppler (Vicar)
on Oct 26, 2012 at 06:43 UTC ( #1001007=note: print w/replies, xml ) Need Help??


in reply to Processing ~1 Trillion records

I've only glanced at the various answers quickly, so maybe I'm off the mark, but:

My immediate reaction to needing to process that many rows is to try to parallelize the process. It will put a higher load on the DB, but that's what the DB is really good at. Obviously your dataset needs to be partitionable, but I can't imagine a dataset of that size that can't be split in some way.

Michael

Replies are listed 'Best First'.
Re^2: Processing ~1 Trillion records
by Anonymous Monk on Oct 26, 2012 at 12:47 UTC
    Also you need to be sure that it's able to produce results continuously over all those many days. If the program as-writ dies fifteen minutes before starting to write its first file (all data spewing out of RAM only at that point) the entire length of time is waste. No good.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1001007]
help
Chatterbox?
[hippo]: Ah
[Corion]: But maybe there is some other Unicode string that will be true but have a zero width
[hippo]: For explanation, I've seen this construct in someone else's code (no names, no pack drill) and couldn't think of a situation to trigger it.
[Corion]: You'll have to look somewhere esoteric for that. Maybe some tied variable or special dualvar can also trigger that. But it's certainly not a common occurrence
[Corion]: And on 5.20, the following also outputs no find:perl -wle 'for my $x ("\x{2000}".."\ x{1fffff}") { if( $x && ! length $x ) { warn qq(<$x>); warn length $x; die } }'
[Corion]: (this time on Unix)
[hippo]: Understood. I'll have to go through the code and see if it's doing anything fancy with ties, dual-vars or non-scalars. In the end, it's probably a bug though.

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (14)
As of 2017-07-27 13:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I came, I saw, I ...
























    Results (413 votes). Check out past polls.