Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Question on IO::UnCompress::GunZip...

by biswanath_c (Beadle)
on Jan 17, 2010 at 01:23 UTC ( #817805=perlquestion: print w/ replies, xml ) Need Help??
biswanath_c has asked for the wisdom of the Perl Monks concerning the following question:


Hi

I have a requirement like this:

I have a bunch of .gz files each having a huge text file containing multiple lines separated by "\n". Now, I would like to use IO::UnCompress::Gunzip module and find out how many lines (separated by "\n") exactly does each text file (each compressed into a separate .gz file) contain. How do i do it?

Again, i do not really want to read the contents of the file; I just want to know how many lines are there inside the text file

What would be the fastest approach to find this out?


Thanks

Biswanath


Comment on Question on IO::UnCompress::GunZip...
Re: Question on IO::UnCompress::GunZip...
by biohisham (Priest) on Jan 17, 2010 at 02:06 UTC
Re: Question on IO::UnCompress::GunZip...
by Khen1950fx (Canon) on Jan 17, 2010 at 03:16 UTC
    Multiple lines separated by "\n" would constitute a regular text file; hence, for that I'd do something like this:
    #!/usr/bin/perl use strict; use warnings; my $lines = 0; my $filename = '/path/to/tar.gz'; die "Can't open '${filename}': $!" unless open FILE, $filename; while (sysread FILE, my $buffer, 4096) { $lines += ($buffer =~ tr/\n//); } close FILE;

      Thanks for the reply Khen! But would a "open FILE...." statement work fine to open a zip file (.gz)? i.e., I can just read the zip file as if it were a normal text file using the very same set of operations/commands?


        Unfortunately, no. The script can count newlines, but opening and reading the zipped file is a different thing altogether.
Re: Question on IO::UnCompress::GunZip...
by Khen1950fx (Canon) on Jan 17, 2010 at 20:35 UTC
    For greater precision, here's another way. Download the Archive::Extract tarball, then:
    #!/usr/bin/perl use strict; use warnings; use Archive::Extract; my $ae = Archive::Extract->new( archive => '/path/to/Archive-Extract- +0.38.tar.gz' ); my $ok = $ae->extract( to => '/path/to/Archive-Extract' ); my $files = $ae->files; my $outdir = $ae->extract_path; $ae->is_tgz; print $outdir, "\n"; chdir($outdir); my $lines = 0; my $filename = 'README'; die "Can't open '${filename}': $!" unless open FILE, $filename; while (sysread FILE, my $buffer, 4096) { $lines += ($buffer =~ tr/\n//); } close FILE; print "$lines\n";
      Man, perl monks are awesome. I don't even need this information, but it's so cool to see great solutions provided.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://817805]
Approved by biohisham
Front-paged by biohisham
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2015-07-03 21:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (56 votes), past polls