Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: selecting columns from a tab-separated-values file

by hippo (Curate)
on Jan 21, 2013 at 22:59 UTC ( #1014519=note: print w/ replies, xml ) Need Help??


in reply to selecting columns from a tab-separated-values file

If by "1B" you mean 10^9 and if your fields have mean length 9 chars, then including tabs you have roughly 500GB in one file, correct? I'm not surprised it is very slow. How fast to just cat the file? How much slower is your script?

Best advice is buy the fastest disk you can afford. And maybe think about preprocessing.


Comment on Re: selecting columns from a tab-separated-values file
Re^2: selecting columns from a tab-separated-values file
by ibm1620 (Beadle) on Jan 22, 2013 at 04:09 UTC
    Yes, 10^9 records. The entire file is more like 80GB, since there are frequent empty fields. I estimated the time for my Perl program to pass the entire file to be around 5 hours. Don't know how much faster cat is, but it's GOT to be faster than that (I'm not at work at the moment).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1014519]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (8)
As of 2014-08-23 08:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (172 votes), past polls