|
|
| Welcome to the Monastery | |
| PerlMonks |
Re: Biggest file?by erix (Priest) |
| on Dec 17, 2011 at 12:23 UTC ( #944068=note: print w/ replies, xml ) | Need Help?? |
|
1. The UniProt (=SwissProt+Trembl) monthly updated protein info database. We put these datafiles into a database. Uniprot.org also makes available this data in XML form (same URL as below) but I find those too large to download/handle/process. The (smaller) .dat files are regular text files: size URL maxsize OS fs description length format ------------------------------------------------------------------------ Swiss-Prot (1) 2.4 GB linux ext3 protein info variable free text (multiline) Trembl (2) 47.5 GB linux ext3 protein info variable free text (multiline) (1): Swiss-Prot (curated data): uniprot_sprot.dat (2): Trembl (uncurated data): uniprot_trembl.dat Uniprot grows pretty fast too: see the graphs on the SwissProt and TrEMBL stats pages. 2. Sometimes it's necessary to munge a database dump (in text form). They can be 100s of GB. 3. Semi-continuously processed data-files vary from tiny to 1 GB (xml+csv, linux).
In Section
Seekers of Perl Wisdom
|
|
||||||||||||||||||||