Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

unable to extract same file twice using Archive::Zip

by Pickwick (Beadle)
on Jul 06, 2010 at 06:19 UTC ( [id://848152]=perlquestion: print w/replies, xml ) Need Help??

Pickwick has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I have to extract a zip file exactly in the order the files are present in the zip, but some of the files need the contents of other files in the zip which may or may not already have been processed to be processed itself. I tried to solve this by first getting all members I need for processing the special files and if one of those special files is to be processed I just wanted to extract the formerly retrieved files as needed.

The problem is that the formerly retrieved files could already have beend extracted and processed on their own and that every attempt to extract the same member object a second time to another filename failes with AZ_STREAM_END. I just use extractToFileNamed and can't find anything in the docs where is stated that multiple extractions of the same member is not allowed or possible.

Because of the error I tried using endRead() and rewindData() before extracting the member a second time but this didn't change anything. It's really a lot easier, and the files need to be extracted twice are really small, to extract some members twice, rather than storing which member already has been extracted and reuse those files and cleaning up later etc.

The only thing which seems to work, because the algorithm I wanted to change used to work this way before, is to extract all the files I need two times, store the extracted files and recreate the zip-Object using Archive::Zip->new()->read() a second time to process all members. This time they are only extracted ones and the formerly extracted files used as needed.

Is there really no easy way to just extract a zip member twice?

Replies are listed 'Best First'.
Re: unable to extract same file twice using Archive::Zip
by desemondo (Hermit) on Jul 06, 2010 at 08:11 UTC
    This sounds like another xy...
    I got lost halfway through the first paragraph. What is the bigger picture for what you are trying to achieve?

    I have to extract a zip file exactly in the order the files are present in the zip

    why do you think that? At face value, thats simply not true. Whats the real reason why you (think you) need to process them that way?

    Also, perhaps you can show us a short example of your code that has this problem, to try and make things clearer?
      why do you think that? At face value, thats simply not true. Whats the real reason why you (think you) need to process them that way?
      I really have to, because I have a server which processes the zip files, does some checking and writes a protocol. Another client gets that protocol and does some checking on the zip contents itself with the infos from the servers protocol. Server and client assume the order of the zip contents as processing order of the protocol. That's nothing to change now.
      Also, perhaps you can show us a short example of your code that has this problem, to try and make things clearer?

      I don't think my code will help you, therefore I wrote a more simple example.

      my $zip = Archive::Zip->new(); $zip->read('someFile.zip'); foreach my $member ($zip->members()) { $member->extractToFileNames('foo.txt'); $member->extractToFileNames('bar.txt'); }

      The second try to extract the same member seems always to fail and I can't find an example which states that it is general possible to extract the same member more than once or not.

        Using your example it would make more sense to simply copy foo.txt as bar.txt instead of extracting it twice...


        So the issues/requirements you are facing at this point are:
        • I have to extract a zip file exactly in the order the files are present in the zip

        • I must extract the file from the archive twice as i cannot copy the extracted version to the second location

        Forgive me, but I don't understand. I would gladly offer whatever help I can provide, but this problem your facing doesn't make sense... please clarify

        PS.
        It's really a lot easier, and the files need to be extracted twice are really small, to extract some members twice, rather than storing which member already has been extracted and reuse those files and cleaning up later etc. .
        For what its worth, how about this approach:
        1. Obtain your "special" ordering of your zip file members.  my @member_order = $zip->members();
        2. Extract the entire archive to a directory. $zip->extractTree( 'stuff', '/tmp' );
        3. Process/reprocess the files in the extracted temp directory as required, based on their order within @member_order

        If this approach is unsuitable, why not?
Re: unable to extract same file twice using Archive::Zip
by ww (Archbishop) on Jul 06, 2010 at 12:39 UTC
    Your explanation of WHY you need to extract a particular file (or files) a second time fails the "plausibility test" with me... and your simplified example below casts no light on the problem (largely because I can see no rational justification for dividing what sounds like a merely occasional and low-overhead procedure among multiple servers (unless, perhaps, one is a db server?).

    Worse, your simplified code gives no hint of HOW you attempt to unzip a member a second time, nor of HOW you're directing the second attempt to a different directory (to avoid overwriting the first) and nowhere do you cite error messages supporting your description of what's happening. Please clarify.

    See I know what I mean. Why don't you?. The OP, in fact, sounds like you believe there's some sort of quantum entanglement between the files... or "spooky action at a distance."

    If not, why can't you...

    • use a standalone zip program?
    • copy the extracted file(s) that need dupes?
    • Close the zip and reopen for your second attempt?
    or, better, provide a clearer explanation -- with code -- of what you're trying to do.

    If you believe the issue is in your module, the first alternative above would offer you -- at a minimum -- the opportunity to start testing your belief)

    OT: Your " <<quote>" pseudo-tags are not particularly prominent. You may want to see Markup in the Monastery, Writeup Formatting Tips and Perl Monks Approved HTML tags. And, as my usage in this para suggests, one common (local) way of making quotes obvious here is to use italics... or, in the case of a long-ish quote, to use italics inside a <blockquote>...</blockquote> pair.

      Your explanation of WHY you need to extract a particular file (or files) a second time fails the "plausibility test" with me...

      Maybe I should have been more careful with my question because it's not, that I want to discuss why I have to keep the processing order or why there's even the need to do the processing twice on server and client. I really just wanted to know if one and the same zip member can be extracted more than once. And with one and the same I mean the same object, no zip reloading, nothing.

      Worse, your simplified code gives no hint of HOW you attempt to unzip a member a second time,

      Of course it does, I have a zip member on which extractToFileNamed is called twice and the second call fails.

      nor of HOW you're directing the second attempt to a different directory (to avoid overwriting the first)

      extractToFileNamed gets a temporary filename each time it's called from File::Temp::tempnam.

      nowhere do you cite error messages supporting your description of what's happening

      The error is AZ_STREAM_END as said before, but I don't understand why. The temporary file is created properly, it just gets no data.

      use a standalone zip program?

      Why should I day? Archive::Zip is the better tool for my needs.

      copy the extracted file(s) that need dupes?

      It's no problem to find another approach, I just wondered why something that easy like extracting a zip member object twice caused me trouble.

      Close the zip and reopen for your second attempt?

      Of course I can do that, and I do it now, but why should this be neccessary? The member class doesn't say something about that extracting more than once kills people.

      If you believe the issue is in your module

      Of course I don't. ;-)

        I stand corrected re the multiple extractions. Your simplified code does attempt to do so.

        re 'error messages' however, I still don't see them. Executing perldoc Archive::Zip will tell you:

        ERROR CODES Many of the methods in Archive::Zip return error codes. These are implemented as inline subroutines, using the "use constant" pragma +. They can be imported into your namespace using the ":ERROR_CODES" tag: use Archive::Zip qw( :ERROR_CODES ); ... unless ( $zip->read( 'myfile.zip' ) == AZ_OK ) { die "whoops!"; } AZ_OK (0) Everything is fine. AZ_STREAM_END (1) The read stream (or central directory) ended normally. AZ_ERROR (2) There was some generic kind of error. AZ_FORMAT_ERROR (3) There is a format error in a ZIP file being read. AZ_IO_ERROR (4) There was an IO error.

        The only one of these you cite is "AZ_STREAM_END" which does NOT indicate an error; far less, what kind of error.

        However, attempting to execute you alternate code (under w2k with A::Z v130 on perl 5.8 with strict and warnings; do you use these pragmas?) does pop up a message:

        Can't locate object method "extractToFileNames" via package "Archive::Zip::ZipFileMember" at 848152.pl line 13.

        And the meaning of your reply re suspecting the module boils down to interpretation of the wink? Perhaps you should consult the author, Adam Kennedy... but with better evidence than a wink.

Re: unable to extract same file twice using Archive::Zip
by Khen1950fx (Canon) on Jul 06, 2010 at 08:33 UTC
      Actually, it's a FAQ. See: Duplicate files in Zip?

      The file is not stored twice in the zip, I just want to extract one and the same member-object more than once, evenm to different locations.

Re: unable to extract same file twice using Archive::Zip
by Pickwick (Beadle) on Jul 07, 2010 at 10:19 UTC

    I've recreated a test program which shows me error messages on stdout and unzipping more than once really seems to not work for me. I have ActiveState Perl 5.10.1.1007 32 Bit on Win Server 2003 R2 SP2 and Archive::Zip has Version 1.30.

    use strict; use Archive::Zip qw(:CONSTANTS :ERROR_CODES); use File::Temp; my $tmp = $ENV{'TEMP'}.'/zip/extractMember'; my $zip = Archive::Zip->new(); if ($zip->read('test.zip') != AZ_OK) { die 'Einlesen klappt nicht.'; } foreach my $member ($zip->members()) { print($member->fileName()."\n"); $member->extractToFileNamed(File::Temp::tempnam($tmp, 'XXXXXXXXXXXXX +XXXXXXXXXXXXXXXXXXX')); $member->extractToFileNamed(File::Temp::tempnam($tmp, 'XXXXXXXXXXXXX +XXXXXXXXXXXXXXXXXXX')); }

    Output:

    Anlagen.zip
    format error: CRC or size mismatch while skipping data descriptor
    at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 189
    Archive::Zip::ZipFileMember::_skipLocalFileHeader('Archive::Zip::ZipFileMember=HASH(0x1c754dc)') called at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 395
    Archive::Zip::ZipFileMember::rewindData('Archive::Zip::ZipFileMember=HASH(0x1c754dc)') called at C:/Programme/Perl/lib/Archive/Zip/Member.pm line 990
    Archive::Zip::Member::extractToFileHandle('Archive::Zip::ZipFileMember=HASH(0x1c754dc)', 'IO::File=GLOB(0x1c7b924)') called at C:/Programme/Perl/lib/Archive/Zip/Member.pm line 488
    Archive::Zip::Member::extractToFileNamed('Archive::Zip::ZipFileMember=HASH(0x1c754dc)', 'D:\Benutzer\TSCHOE~1\LOKALE~1\Temp\zip\extractMember\DKbzQX0R...') called at D:/Benutzer/tschoening/Eigene Dateien/Eclipse/Perltests/StandAlone/zip/extractMember.pl line 18
    Anlagen.zip.pk7
    format error: CRC or size mismatch while skipping data descriptor
    at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 189
    Archive::Zip::ZipFileMember::_skipLocalFileHeader('Archive::Zip::ZipFileMember=HASH(0x1c7586c)') called at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 395
    Archive::Zip::ZipFileMember::rewindData('Archive::Zip::ZipFileMember=HASH(0x1c7586c)') called at C:/Programme/Perl/lib/Archive/Zip/Member.pm line 990
    Archive::Zip::Member::extractToFileHandle('Archive::Zip::ZipFileMember=HASH(0x1c7586c)', 'IO::File=GLOB(0x1c7b934)') called at C:/Programme/Perl/lib/Archive/Zip/Member.pm line 488
    Archive::Zip::Member::extractToFileNamed('Archive::Zip::ZipFileMember=HASH(0x1c7586c)', 'D:\Benutzer\TSCHOE~1\LOKALE~1\Temp\zip\extractMember\hrEJi2dV...') called at D:/Benutzer/tschoening/Eigene Dateien/Eclipse/Perltests/StandAlone/zip/extractMember.pl line 18
    Testeinreichung AM-SoFT.pdf.p7s
    format error: CRC or size mismatch while skipping data descriptor
    at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 189
    Archive::Zip::ZipFileMember::_skipLocalFileHeader('Archive::Zip::ZipFileMember=HASH(0x1c75a3c)') called at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 395
    Archive::Zip::ZipFileMember::rewindData('Archive::Zip::ZipFileMember=HASH(0x1c75a3c)') called at C:/Programme/Perl/lib/Archive/Zip/Member.pm line 990
    Archive::Zip::Member::extractToFileHandle('Archive::Zip::ZipFileMember=HASH(0x1c75a3c)', 'IO::File=GLOB(0x1c7bc04)') called at C:/Programme/Perl/lib/Archive/Zip/Member.pm line 488
    Archive::Zip::Member::extractToFileNamed('Archive::Zip::ZipFileMember=HASH(0x1c75a3c)', 'D:\Benutzer\TSCHOE~1\LOKALE~1\Temp\zip\extractMember\za1J__f2...') called at D:/Benutzer/tschoening/Eigene Dateien/Eclipse/Perltests/StandAlone/zip/extractMember.pl line 18
    Testeinreichung AM-SoFT.pdf
    format error: CRC or size mismatch while skipping data descriptor
    at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 189
    Archive::Zip::ZipFileMember::_skipLocalFileHeader('Archive::Zip::ZipFileMember=HASH(0x1c75c0c)') called at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 395
    Archive::Zip::ZipFileMember::rewindData('Archive::Zip::ZipFileMember=HASH(0x1c75c0c)') called at C:/Programme/Perl/lib/Archive/Zip/Member.pm line 990
    Archive::Zip::Member::extractToFileHandle('Archive::Zip::ZipFileMember=HASH(0x1c75c0c)', 'IO::File=GLOB(0x1c7b704)') called at C:/Programme/Perl/lib/Archive/Zip/Member.pm line 488
    Archive::Zip::Member::extractToFileNamed('Archive::Zip::ZipFileMember=HASH(0x1c75c0c)', 'D:\Benutzer\TSCHOE~1\LOKALE~1\Temp\zip\extractMember\tqEEW0aI...') called at D:/Benutzer/tschoening/Eigene Dateien/Eclipse/Perltests/StandAlone/zip/extractMember.pl line 18
    Testeinreichung AM-SoFT_signed.pdf
    IO error: reading header signature :
    at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 279
    Archive::Zip::ZipFileMember::_readDataDescriptor('Archive::Zip::ZipFileMember=HASH(0x1887acc)') called at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 186
    Archive::Zip::ZipFileMember::_skipLocalFileHeader('Archive::Zip::ZipFileMember=HASH(0x1887acc)') called at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 395
    Archive::Zip::ZipFileMember::rewindData('Archive::Zip::ZipFileMember=HASH(0x1887acc)') called at C:/Programme/Perl/lib/Archive/Zip/Member.pm line 990
    Archive::Zip::Member::extractToFileHandle('Archive::Zip::ZipFileMember=HASH(0x1887acc)', 'IO::File=GLOB(0x1c7baa4)') called at C:/Programme/Perl/lib/Archive/Zip/Member.pm line 488
    Archive::Zip::Member::extractToFileNamed('Archive::Zip::ZipFileMember=HASH(0x1887acc)', 'D:\Benutzer\TSCHOE~1\LOKALE~1\Temp\zip\extractMember\Nojn_u7h...') called at D:/Benutzer/tschoening/Eigene Dateien/Eclipse/Perltests/StandAlone/zip/extractMember.pl line 18

      Your script (slightly modified) works for me:

      use strict; use Archive::Zip qw(:CONSTANTS :ERROR_CODES); use File::Temp; warn "Using Archive::Zip $Archive::Zip::VERSION"; my $tmp = $ENV{'TEMP'}.'/zip/extractMember'; my $zip = Archive::Zip->new(); if ($zip->read('tmp2.zip') != AZ_OK) { die 'Einlesen klappt nicht.'; } foreach my $member ($zip->members()) { print($member->fileName()."\n"); my $target = File::Temp::tempnam($tmp, 'XXXXXXXXXXXXXXXXXXXXXXXXXXXX +XXXX'); print "First"; $member->extractToFileNamed($target); print "\n"; print "Second"; $target = File::Temp::tempnam($tmp, 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX +X'); $member->extractToFileNamed($target); print "\n"; } __END__ Using Archive::Zip 1.23 at tmp.pl line 5. tmp2.txt First Second

      Note that I'm using an older version of Archive::Zip here. I also don't get the warning about the CRC32 mismatch:

      format error: CRC or size mismatch while skipping data descriptor at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 189

      Are you sure that your archive is OK? I created my test archive using 7zip.

      I'll be upgrading to Archive::Zip 1.30 and retest.

      Update: Upgraded to Archive::Zip 1.30 and it still works for me:

      Using Archive::Zip 1.30 at tmp.pl line 5. tmp2.txt First Second

      My guess is that your zip file is broken.

        My guess is that your zip file is broken.

        Seems you are right. I extracted the zip contents using WinRar, recreated a new zip and Archive::Zips handles it fine. The funny thing is, that our software created the zip using the same version of Archive::Zip on it's own, before it extracted it using Archive::Zip again. Seems we are doing something wrong while creating the zip, which doesn't affect extracting the files using other zip programs, but does produce trouble in Archive::Zip when extracting more than once.

        Thanks!

Re: unable to extract same file twice using Archive::Zip
by doug (Pilgrim) on Jul 06, 2010 at 17:17 UTC

    I don't know a thing about Archive::Zip, but if you're having problems extracting the same thing twice into two different places, why not extract the whole archive in to a throw away location (/tmp comes to mind) and just copy what you need to wherever you need it. You can make as many copies as you like. When you're done making copies, just delete the temp location and call it a day.

    Yes, this is an inelegant solution, but it should be easy to implement.

    - doug

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://848152]
Approved by almut
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (3)
As of 2024-03-19 05:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found