Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: reading zipped bzipped files!

by johngg (Abbot)
on Dec 22, 2012 at 00:26 UTC ( #1009965=note: print w/ replies, xml ) Need Help??


in reply to reading zipped bzipped files!

I wondered how this might be achieved so decided to have a go. I prepared a ZIP archive containing three bzip2'ed files (man page outputs) stored without further compression.

$ man ls > ls.man $ man xterm > xterm.man $ man cp > cp.man $ bzip2 -v *.man cp.man: 2.496:1, 3.206 bits/byte, 59.93% saved, 5690 in, 2280 o +ut. ls.man: 2.644:1, 3.026 bits/byte, 62.18% saved, 8093 in, 3061 o +ut. xterm.man: 5.062:1, 1.580 bits/byte, 80.24% saved, 264279 in, 5221 +0 out. $ zip -0m mans *.man.bz2 adding: cp.man.bz2 (stored 0%) adding: ls.man.bz2 (stored 0%) adding: xterm.man.bz2 (stored 0%) $

The following script constructs an Archive::Zip object to access the ZIP file and gets a list of member files. Then for each member it creates a member object and uses that to obtain the content. A reference to this content, by way of an on-the-fly subroutine, is used as the argument to the IO::Uncompress::Bunzip2 constructor which can then be read line by line. I just print the first five lines of each member file to demonstrate that the method works. I have not incorporated any error checking, this is left as an exercise for the reader.

use strict; use warnings; use 5.014; use Archive::Zip; use IO::Uncompress::Bunzip2; my $zipFile = q{mans.zip}; my $zip = Archive::Zip->new( $zipFile ); my @members = $zip->memberNames(); foreach my $member ( @members ) { say qq{Member: $member}; my $memberFH = $zip->memberNamed( $member ); my $bzFH = IO::Uncompress::Bunzip2->new( sub { \ $_[ 0 ] }->( $memberFH->contents() ) ); my $lineCt = 0; while ( my $line = $bzFH->getline() ) { last if $lineCt ++ > 5; print $line; } }

The output.

Member: cp.man.bz2 CP(1) User Commands + CP(1) NAME cp - copy files and directories Member: ls.man.bz2 LS(1) User Commands + LS(1) NAME ls - list directory contents Member: xterm.man.bz2 XTERM(1) X Window System + XTERM(1) NAME xterm - terminal emulator for X

I hope this is useful.

Cheers,

JohnGG


Comment on Re: reading zipped bzipped files!
Select or Download Code
Replies are listed 'Best First'.
Re^2: reading zipped bzipped files!
by pmqs (Monk) on Dec 22, 2012 at 01:13 UTC
    Here is a variation on a theme that prints the forst 5 lines of each member in the zip file. The difference with this one is that the complete bzip2 file doesn't need to be read into memory.
    use strict; use warnings; use IO::Uncompress::Unzip qw($UnzipError); use IO::Uncompress::Bunzip2; my $zipFile = q{mans.zip}; my $zip = IO::Uncompress::Unzip->new( $zipFile ) or die "Cannot open $zipFile: $UnzipError"; my $status; for ($status = 1; $status > 0; $status = $zip->nextStream()) { my $name = $zip->getHeaderInfo()->{Name}; warn "Processing member $name\n" ; my $bzFH = IO::Uncompress::Bunzip2->new($zip); my $lineCt = 0; while ( my $line = <$bzFH> ) { last if $lineCt ++ > 5; print $line; } }

      Excellent ++

      I was trying to come up with a way which avoided reading the whole bzipped file but I am not familiar yet with these IO::Compress/Uncompress::* modules as most of my work is on servers running 5.8.x or older and I had only used Archive::Zip before. I will have to study.

      Thank you for showing me this method :-)

      Cheers,

      JohnGG

Re^2: reading zipped bzipped files!
by mike_gerard (Novice) on Dec 22, 2012 at 20:46 UTC

    Thanks to both of you. I was almost there with my code, but was wrong on the code line

     while ($line = <$zbz2>)

    Of course, my actual code is a bit more complicated because of other little gotchas, but it now works. Have a great Christmas. Mike Gerard

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1009965]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2015-07-30 01:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (269 votes), past polls