File Compression

selva has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: File Compression by scorpio17 (Canon) on Jun 12, 2009 at 14:24 UTC
If you using unix, you can do something like this: `tar cvzf - mydir \| split -d -b 2g - mybackup.tar.gz.` [download] This will archive the contents of mydir, using gzip compression. The archive will be split into files having names like mybackup.tar.gz.00, mybackup.tar.gz.01, mybackup.tar.gz.02, etc. Each of these backup files will be no larger than 2 Gig. You can unpack the archive like this: `cat mybackup.tar.gz.* \| tar xvzf` [download] For more info, read the man pages on tar and split.	[reply] [d/l] [select]
Re: File Compression by marto (Cardinal) on Jun 12, 2009 at 12:40 UTC
"While compressing if the compressed file reaches 2 GB then i have to create new compressed file because In Linux maximum file size is 2GB" That is a rather sweeping statement, which file system are you using? Update: You may want to look at Algorithm::Knapsack and the associated script filesack "The filesack program finds one or more subsets of files or directories with the maximum total size not exceeding a given size." Martin	[reply]
Re: File Compression by romandas (Pilgrim) on Jun 12, 2009 at 13:58 UTC
Assuming for a moment that the Linux filesystem you are using supports large file sizes (are you using some flavor of FAT?), the problem you may be experiencing with max file size 2GB is likely the application you are using to create the files in the first place. For example, many libpcap packages are compiled without the options -D _LARGEFILE64_SOURCE -D FILE_OFFSET_BITS=64, which enables large file support. Without these options set, any libpcap-aware application will only write pcap traces up to 2 GB, then crash. This is likely your problem with whatever compression application you are using. Try finding one built with large file support.	[reply]
Re: File Compression by rovf (Priest) on Jun 12, 2009 at 13:02 UTC
In Linux maximum file size is 2GB. How old is the Linux you are using? Looking here does not suggest that you would run into such a limit. Also, I guess your problem is not compressing the files (because compressing would make them smaller anyway), but creating an archive of the compressed files. Since you can separate these steps (first compressing, then archiving), the question is whether we have an archive module where you can fix the maximum size of the archive created. I don't think there is (it's a pretty special requirement), but you can easily add up the file sizes yourself, can't you? BTW, how does tar react if you just stuff into it file after file until the result would exceed the allowed maximum file size? Did you try this? -- Ronald Fischer <ynnor@mm.st>	[reply] [d/l]
Re: File Compression by targetsmart (Curate) on Jun 12, 2009 at 12:57 UTC
If you are using ext3 file system(common in Linux) , the maximum file size is 16 GiB to 2 TiB, depending on the block size. First check with those compression tools for such options for splitting otherwise, compress hundred files, then try to split into 2 GB pieces. Vivek -- In accordance with the prarabdha of each, the One whose function it is to ordain makes each to act. What will not happen will never happen, whatever effort one may put forth. And what will happen will not fail to happen, however much one may seek to prevent it. This is certain. The part of wisdom therefore is to stay quiet.	[reply]
Re^2: File Compression by marto (Cardinal) on Jun 12, 2009 at 13:01 UTC
"otherwise, compress hundred files, then try to split into 2 GB pieces." I'm not sure I follow your advice here, if the OP is complaining that, for what ever reason, they can't write files > 2GB, how are they going to do this? Martin	[reply]
Re^3: File Compression by rovf (Priest) on Jun 12, 2009 at 13:08 UTC
If I understood targetsmart right, he didn't mean "create an archive of all the files and then split into 2GB chunks", but "compress the files, then partition them into sufficiently small sets, then create the archives". -- Ronald Fischer <ynnor@mm.st>	[reply] [d/l]
Re: File Compression by Transient (Hermit) on Jun 12, 2009 at 13:39 UTC
As stated above, I don't think you can specify the output chunks beforehand. The brute force methodology would be: `Pseudo code: # foreach $file ( @files ) { # save off a copy of the current compressed file if it exists # attempt to add to current compressed file # if compression fails or new file size > 2GB # rename the saved compression file copy to the working copy # start a new compressed file # add file to compressed file # end # end foreach` [download] Of course this runs into problems if a single file compresses to >2GB	[reply] [d/l]
Re^2: File Compression by rovf (Priest) on Jun 12, 2009 at 14:57 UTC
Of course this runs into problems if a single file compresses to >2GB At least you know then that on your platform, the size limit can't be 2 GB ;-) -- Ronald Fischer <ynnor@mm.st>	[reply] [d/l]
Re^3: File Compression by Transient (Hermit) on Jun 12, 2009 at 15:03 UTC
Touch�! I should have said : Of course this runs into problems if a single file would otherwise compress to >2GB =D	[reply]


Problems? Is your data what you think it is?
	PerlMonks

File Compression

Dear Monks ,