Re^3: Getting/handling big files w/ perl

Gisel:

In addition to BrowserUk's tips, one I found useful was to filter the data if you don't need it all. I had to deal with processing a horrendous amount of credit card transaction information in the past, and filtering out the data I didn't need allowed me to save quite a bit of storage space^[*]. So if the resulting files have a large amount of data in them you won't ever use, you may find it worth while to filter the data before storing it.

You mention that the input files are in NetCDF format, so I did a quick surf to Wikipedia's NetCDF article, and see that there are some unix command-line tools for file surgery already available. So if you know the items you need from the files, you may be able to chop out a good bit of data from them and avoid compression altogether. If you're storing the files locally, you can probably avoid the time cost of filtering the data by using your filtering operation as the operation you use to copy to long-term storage (saving some network traffic to your SAN in the bargain).

*: My original purpose wasn't to save the disk space, but to use a single file format for my process. The incoming data was in multiple very different format types. (About 15 different file formats, IIRC.) The processor needed the files sorted and in a different format. The resulting space savings (Substantial!) was just a product of the input file format.

Update: Fixed acronym... (I wonder what IIRS might mean? D'oh!)

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Comment on Re^3: Getting/handling big files w/ perl


We don't bite newbies here... much
	PerlMonks