Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Biggest file?

by erix (Vicar)
on Dec 17, 2011 at 12:23 UTC ( #944068=note: print w/ replies, xml ) Need Help??


in reply to Biggest file?

1. The UniProt (=SwissProt+Trembl) monthly updated protein info database. We put these datafiles into a database. Uniprot.org also makes available this data in XML form (same URL as below) but I find those too large to download/handle/process. The (smaller) .dat files are regular text files:

   size     URL   maxsize   OS    fs    description   length     format
------------------------------------------------------------------------
Swiss-Prot  (1)    2.4 GB   linux  ext3  protein info  variable  free text (multiline)
Trembl      (2)   47.5 GB   linux  ext3  protein info  variable  free text (multiline)


  (1): Swiss-Prot (curated data): uniprot_sprot.dat
  (2): Trembl (uncurated data): uniprot_trembl.dat

Uniprot grows pretty fast too: see the graphs on the SwissProt and TrEMBL stats pages.

2. Sometimes it's necessary to munge a database dump (in text form). They can be 100s of GB.

3. Semi-continuously processed data-files vary from tiny to 1 GB (xml+csv, linux).


Comment on Re: Biggest file?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://944068]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2015-07-06 01:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (68 votes), past polls