Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

File Compression

by selva (Scribe)
on Jun 12, 2009 at 12:36 UTC ( [id://770936]=perlquestion: print w/replies, xml ) Need Help??

selva has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks ,
my requirement is
For example i need to compress 100 files using any compression tool (zip,gzip,bzip2 ..). While compressing if the compressed file reaches 2 GB then i have to create new compressed file because In Linux maximum file size is 2GB .

Is there any perl module or any solution available for meeting above requirement .

Replies are listed 'Best First'.
Re: File Compression
by scorpio17 (Canon) on Jun 12, 2009 at 14:24 UTC

    If you using unix, you can do something like this:

    tar cvzf - mydir | split -d -b 2g - mybackup.tar.gz.

    This will archive the contents of mydir, using gzip compression. The archive will be split into files having names like mybackup.tar.gz.00, mybackup.tar.gz.01, mybackup.tar.gz.02, etc. Each of these backup files will be no larger than 2 Gig.
    You can unpack the archive like this:

    cat mybackup.tar.gz.* | tar xvzf

    For more info, read the man pages on tar and split.

Re: File Compression
by marto (Cardinal) on Jun 12, 2009 at 12:40 UTC

    "While compressing if the compressed file reaches 2 GB then i have to create new compressed file because In Linux maximum file size is 2GB"

    That is a rather sweeping statement, which file system are you using?

    Update: You may want to look at Algorithm::Knapsack and the associated script filesack "The filesack program finds one or more subsets of files or directories with the maximum total size not exceeding a given size."

    Martin

Re: File Compression
by romandas (Pilgrim) on Jun 12, 2009 at 13:58 UTC

    Assuming for a moment that the Linux filesystem you are using supports large file sizes (are you using some flavor of FAT?), the problem you may be experiencing with max file size 2GB is likely the application you are using to create the files in the first place.

    For example, many libpcap packages are compiled without the options -D _LARGEFILE64_SOURCE -D FILE_OFFSET_BITS=64, which enables large file support. Without these options set, any libpcap-aware application will only write pcap traces up to 2 GB, then crash. This is likely your problem with whatever compression application you are using. Try finding one built with large file support.

Re: File Compression
by rovf (Priest) on Jun 12, 2009 at 13:02 UTC
    In Linux maximum file size is 2GB.

    How old is the Linux you are using? Looking here does not suggest that you would run into such a limit.

    Also, I guess your problem is not compressing the files (because compressing would make them smaller anyway), but creating an archive of the compressed files. Since you can separate these steps (first compressing, then archiving), the question is whether we have an archive module where you can fix the maximum size of the archive created. I don't think there is (it's a pretty special requirement), but you can easily add up the file sizes yourself, can't you?

    BTW, how does tar react if you just stuff into it file after file until the result would exceed the allowed maximum file size? Did you try this?

    -- 
    Ronald Fischer <ynnor@mm.st>
Re: File Compression
by targetsmart (Curate) on Jun 12, 2009 at 12:57 UTC
    If you are using ext3 file system(common in Linux) , the maximum file size is 16 GiB to 2 TiB, depending on the block size.
    First check with those compression tools for such options for splitting
    otherwise, compress hundred files, then try to split into 2 GB pieces.

    Vivek
    -- In accordance with the prarabdha of each, the One whose function it is to ordain makes each to act. What will not happen will never happen, whatever effort one may put forth. And what will happen will not fail to happen, however much one may seek to prevent it. This is certain. The part of wisdom therefore is to stay quiet.

      "otherwise, compress hundred files, then try to split into 2 GB pieces."

      I'm not sure I follow your advice here, if the OP is complaining that, for what ever reason, they can't write files > 2GB, how are they going to do this?

      Martin

        If I understood targetsmart right, he didn't mean "create an archive of all the files and then split into 2GB chunks", but "compress the files, then partition them into sufficiently small sets, then create the archives".

        -- 
        Ronald Fischer <ynnor@mm.st>
Re: File Compression
by Transient (Hermit) on Jun 12, 2009 at 13:39 UTC
    As stated above, I don't think you can specify the output chunks beforehand. The brute force methodology would be:
    Pseudo code: # foreach $file ( @files ) { # save off a copy of the current compressed file if it exists # attempt to add to current compressed file # if compression fails or new file size > 2GB # rename the saved compression file copy to the working copy # start a new compressed file # add file to compressed file # end # end foreach
    Of course this runs into problems if a single file compresses to >2GB
      Of course this runs into problems if a single file compresses to >2GB

      At least you know then that on your platform, the size limit can't be 2 GB ;-)

      -- 
      Ronald Fischer <ynnor@mm.st>
        Touché! I should have said : Of course this runs into problems if a single file would otherwise compress to >2GB =D

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://770936]
Approved by rovf
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-04-18 23:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found