Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Biggest file?

by erix (Vicar)
on Dec 17, 2011 at 12:23 UTC ( #944068=note: print w/ replies, xml ) Need Help??


in reply to Biggest file?

1. The UniProt (=SwissProt+Trembl) monthly updated protein info database. We put these datafiles into a database. Uniprot.org also makes available this data in XML form (same URL as below) but I find those too large to download/handle/process. The (smaller) .dat files are regular text files:

   size     URL   maxsize   OS    fs    description   length     format
------------------------------------------------------------------------
Swiss-Prot  (1)    2.4 GB   linux  ext3  protein info  variable  free text (multiline)
Trembl      (2)   47.5 GB   linux  ext3  protein info  variable  free text (multiline)


  (1): Swiss-Prot (curated data): uniprot_sprot.dat
  (2): Trembl (uncurated data): uniprot_trembl.dat

Uniprot grows pretty fast too: see the graphs on the SwissProt and TrEMBL stats pages.

2. Sometimes it's necessary to munge a database dump (in text form). They can be 100s of GB.

3. Semi-continuously processed data-files vary from tiny to 1 GB (xml+csv, linux).


Comment on Re: Biggest file?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://944068]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (12)
As of 2014-10-01 08:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (390 votes), past polls