Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Question on IO::UnCompress::GunZip...

by biswanath_c (Beadle)
on Jan 17, 2010 at 01:23 UTC ( #817805=perlquestion: print w/ replies, xml ) Need Help??
biswanath_c has asked for the wisdom of the Perl Monks concerning the following question:


Hi

I have a requirement like this:

I have a bunch of .gz files each having a huge text file containing multiple lines separated by "\n". Now, I would like to use IO::UnCompress::Gunzip module and find out how many lines (separated by "\n") exactly does each text file (each compressed into a separate .gz file) contain. How do i do it?

Again, i do not really want to read the contents of the file; I just want to know how many lines are there inside the text file

What would be the fastest approach to find this out?


Thanks

Biswanath


Comment on Question on IO::UnCompress::GunZip...
Re: Question on IO::UnCompress::GunZip...
by biohisham (Priest) on Jan 17, 2010 at 02:06 UTC
Re: Question on IO::UnCompress::GunZip...
by Khen1950fx (Canon) on Jan 17, 2010 at 03:16 UTC
    Multiple lines separated by "\n" would constitute a regular text file; hence, for that I'd do something like this:
    #!/usr/bin/perl use strict; use warnings; my $lines = 0; my $filename = '/path/to/tar.gz'; die "Can't open '${filename}': $!" unless open FILE, $filename; while (sysread FILE, my $buffer, 4096) { $lines += ($buffer =~ tr/\n//); } close FILE;

      Thanks for the reply Khen! But would a "open FILE...." statement work fine to open a zip file (.gz)? i.e., I can just read the zip file as if it were a normal text file using the very same set of operations/commands?


        Unfortunately, no. The script can count newlines, but opening and reading the zipped file is a different thing altogether.
Re: Question on IO::UnCompress::GunZip...
by Khen1950fx (Canon) on Jan 17, 2010 at 20:35 UTC
    For greater precision, here's another way. Download the Archive::Extract tarball, then:
    #!/usr/bin/perl use strict; use warnings; use Archive::Extract; my $ae = Archive::Extract->new( archive => '/path/to/Archive-Extract- +0.38.tar.gz' ); my $ok = $ae->extract( to => '/path/to/Archive-Extract' ); my $files = $ae->files; my $outdir = $ae->extract_path; $ae->is_tgz; print $outdir, "\n"; chdir($outdir); my $lines = 0; my $filename = 'README'; die "Can't open '${filename}': $!" unless open FILE, $filename; while (sysread FILE, my $buffer, 4096) { $lines += ($buffer =~ tr/\n//); } close FILE; print "$lines\n";
      Man, perl monks are awesome. I don't even need this information, but it's so cool to see great solutions provided.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://817805]
Approved by biohisham
Front-paged by biohisham
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2014-07-28 07:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (193 votes), past polls