Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

How to test if a file is gzipped or not

by Special_K (Monk)
on Jun 17, 2024 at 21:49 UTC ( [id://11160043]=perlquestion: print w/replies, xml ) Need Help??

Special_K has asked for the wisdom of the Perl Monks concerning the following question:

I have a script that needs to be able to operate on either a plain text file or a gzipped version of that file. If the file is not gzipped, the file will be opened as follows:


open(FOO, $foo) || die("ERROR: Unable to open file $foo for read: $!") +;

If the file is gzipped, the file will be opened as follows:


open(FOO, "zcat $foo |") || die("ERROR: Unable to open gzipped file $f +oo for read: $!");

How do I write a check to determine whether the file is gzipped or not? zcat errors out when called on non-gzipped files.

Replies are listed 'Best First'.
Re: How to test if a file is gzipped or not
by sectokia (Pilgrim) on Jun 17, 2024 at 23:25 UTC

    IO::Uncompress::AnyInflate will gunzip data, and has a Transparent option. The option is useful as you can select if you want non-gzipped data to just pass through and be returned as if it was the uncompressed result, or if you want it to throw an error.

      TIL IO::Uncompress::AnyInflate exists.

      I don't know if it's important for Special_K, but I read under "Transparent":
      In addition, if the input file/buffer does contain compressed data and there is non-compressed data immediately following it, setting this option will make this module treat the whole file/buffer as a single data stream.
      This might pose a security risk, if one isn't aware of it.
Re: How to test if a file is gzipped or not
by LanX (Saint) on Jun 17, 2024 at 22:09 UTC
    Why don't you just catch the error from gunzipping and try proceeding otherwise?

    If that's not enough, Gzip files start with a header including a magic number (1f 8b)

    You could also look if available uncompress tools have a check option to validate if it's a legit format.°

    Edit

    °) See documentation for gunzip

    • -t --test
      Test. Check the compressed file integrity.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

Re: How to test if a file is gzipped or not
by Tux (Canon) on Jun 18, 2024 at 08:51 UTC

    Instead of opening an extra process (zcat), one can also use modules as already mentioned. Like PerlIO::gzip. For inconsistency gzip is also supported in PerlIO::via::gzip, the two other perlio compress/decompress supported algorithms in that namespace are PerlIO::via::Bzip2 and PerlIO::via::xz.

    On CPAN you can find a plethora of decompressors in the IO::Uncompress namespace where IO::Uncompress::Gunzip is part of (CORE since 5.9.4)


    Enjoy, Have FUN! H.Merijn
Re: How to test if a file is gzipped or not
by eyepopslikeamosquito (Archbishop) on Jun 18, 2024 at 09:38 UTC
Re: How to test if a file is gzipped or not
by jwkrahn (Abbot) on Jun 17, 2024 at 22:06 UTC

    If you have a magic file that should tell what signature a gzip has in order to determine that it is a gzip file.

    Naked blocks are fun! -- Randal L. Schwartz, Perl hacker

      Testing for a file type is called "magic" in linux/unix, hence jwkrahn above mentions the magic file that facilitates translating file signatures to file types.

      In a linux/unix you may test with the file command: file $fileyouwanttotest

      CPAN has File::MimeInfo (and also File::MimeInfo:Magic, but the name is a bit misleading)
      and it works like this:

      use File::MimeInfo; my $mime_type = mimetype('perl-5.38.0.tar.gz'); print("$mime_type\n");

      Cheers, Sören

      Créateur des bugs mobiles - let loose once, run everywhere.
      (hooked on the Perl Programming language)

        There's also File::MMagic and File::Type, and many more.

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: How to test if a file is gzipped or not
by Anonymous Monk on Jun 17, 2024 at 23:43 UTC
    If zcat throws an error then you know it's not gzipped:
    open(FOO, "zcat $foo |") || open(FOO, $foo) || die("ERROR: Unable to o +pen file $foo for read: $!");

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11160043]
Approved by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (7)
As of 2024-09-16 08:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    The PerlMonks site front end has:





    Results (21 votes). Check out past polls.

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.