Pickwick has asked for the wisdom of the Perl Monks concerning the following question:
Hello,
I have to extract a zip file exactly in the order the files are present in the zip, but some of the files need the contents of other files in the zip which may or may not already have been processed to be processed itself. I tried to solve this by first getting all members I need for processing the special files and if one of those special files is to be processed I just wanted to extract the formerly retrieved files as needed.
The problem is that the formerly retrieved files could already have beend extracted and processed on their own and that every attempt to extract the same member object a second time to another filename failes with AZ_STREAM_END. I just use extractToFileNamed and can't find anything in the docs where is stated that multiple extractions of the same member is not allowed or possible.
Because of the error I tried using endRead() and rewindData() before extracting the member a second time but this didn't change anything. It's really a lot easier, and the files need to be extracted twice are really small, to extract some members twice, rather than storing which member already has been extracted and reuse those files and cleaning up later etc.
The only thing which seems to work, because the algorithm I wanted to change used to work this way before, is to extract all the files I need two times, store the extracted files and recreate the zip-Object using Archive::Zip->new()->read() a second time to process all members. This time they are only extracted ones and the formerly extracted files used as needed.
Is there really no easy way to just extract a zip member twice?
Re: unable to extract same file twice using Archive::Zip
by desemondo (Hermit) on Jul 06, 2010 at 08:11 UTC
|
| [reply] [Watch: Dir/Any] |
|
my $zip = Archive::Zip->new();
$zip->read('someFile.zip');
foreach my $member ($zip->members())
{
$member->extractToFileNames('foo.txt');
$member->extractToFileNames('bar.txt');
}
The second try to extract the same member seems always to fail and I can't find an example which states that it is general possible to extract the same member more than once or not.
| [reply] [Watch: Dir/Any] [d/l] |
|
Using your example it would make more sense to simply copy foo.txt as bar.txt instead of extracting it twice...
So the issues/requirements you are facing at this point are:
- I have to extract a zip file exactly in the order the files are present in the zip
- I must extract the file from the archive twice as i cannot copy the extracted version to the second location
Forgive me, but I don't understand. I would gladly offer whatever help I can provide, but this problem your facing doesn't make sense... please clarify
PS.
It's really a lot easier, and the files need to be extracted twice are really small, to extract some members twice, rather than storing which member already has been extracted and reuse those files and cleaning up later etc. .
For what its worth, how about this approach:
1. Obtain your "special" ordering of your zip file members. my @member_order = $zip->members();
2. Extract the entire archive to a directory. $zip->extractTree( 'stuff', '/tmp' );
3. Process/reprocess the files in the extracted temp directory as required, based on their order within @member_order
If this approach is unsuitable, why not?
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
|
|
Re: unable to extract same file twice using Archive::Zip
by ww (Archbishop) on Jul 06, 2010 at 12:39 UTC
|
Your explanation of WHY you need to extract a particular file (or files) a second time fails the "plausibility test" with me... and your simplified example below casts no light on the problem (largely because I can see no rational justification for dividing what sounds like a merely occasional and low-overhead procedure among multiple servers (unless, perhaps, one is a db server?). Worse, your simplified code gives no hint of HOW you attempt to unzip a member a second time, nor of HOW you're directing the second attempt to a different directory (to avoid overwriting the first) and nowhere do you cite error messages supporting your description of what's happening. Please clarify.
See I know what I mean. Why don't you?. The OP, in fact, sounds like you believe there's some sort of quantum entanglement between the files... or "spooky action at a distance."
If not, why can't you...
- use a standalone zip program?
- copy the extracted file(s) that need dupes?
- Close the zip and reopen for your second attempt?
or, better, provide a clearer explanation -- with code -- of what you're trying to do.
If you believe the issue is in your module, the first alternative above would offer you -- at a minimum -- the opportunity to start testing your belief)
OT: Your " <<quote>" pseudo-tags are not particularly prominent. You may want to see Markup in the Monastery, Writeup Formatting Tips and Perl Monks Approved HTML tags. And, as my usage in this para suggests, one common (local) way of making quotes obvious here is to use italics... or, in the case of a long-ish quote, to use italics inside a <blockquote>...</blockquote> pair. | [reply] [Watch: Dir/Any] |
|
Your explanation of WHY you need to extract a particular file (or files) a second time fails the "plausibility test" with me...
Maybe I should have been more careful with my question because it's not, that I want to discuss why I have to keep the processing order or why there's even the need to do the processing twice on server and client. I really just wanted to know if one and the same zip member can be extracted more than once. And with one and the same I mean the same object, no zip reloading, nothing.
Worse, your simplified code gives no hint of HOW you attempt to unzip a member a second time,
Of course it does, I have a zip member on which extractToFileNamed is called twice and the second call fails.
nor of HOW you're directing the second attempt to a different directory (to avoid overwriting the first)
extractToFileNamed gets a temporary filename each time it's called from File::Temp::tempnam.
nowhere do you cite error messages supporting your description of what's happening
The error is AZ_STREAM_END as said before, but I don't understand why. The temporary file is created properly, it just gets no data.
use a standalone zip program?
Why should I day? Archive::Zip is the better tool for my needs.
copy the extracted file(s) that need dupes?
It's no problem to find another approach, I just wondered why something that easy like extracting a zip member object twice caused me trouble.
Close the zip and reopen for your second attempt?
Of course I can do that, and I do it now, but why should this be neccessary? The member class doesn't say something about that extracting more than once kills people.
If you believe the issue is in your module
Of course I don't. ;-)
| [reply] [Watch: Dir/Any] |
|
I stand corrected re the multiple extractions. Your simplified code does attempt to do so.
re 'error messages' however, I still don't see them. Executing perldoc Archive::Zip will tell you:
ERROR CODES
Many of the methods in Archive::Zip return error codes. These are
implemented as inline subroutines, using the "use constant" pragma
+. They
can be imported into your namespace using the ":ERROR_CODES" tag:
use Archive::Zip qw( :ERROR_CODES );
...
unless ( $zip->read( 'myfile.zip' ) == AZ_OK ) {
die "whoops!";
}
AZ_OK (0)
Everything is fine.
AZ_STREAM_END (1)
The read stream (or central directory) ended normally.
AZ_ERROR (2)
There was some generic kind of error.
AZ_FORMAT_ERROR (3)
There is a format error in a ZIP file being read.
AZ_IO_ERROR (4)
There was an IO error.
The only one of these you cite is "AZ_STREAM_END" which does NOT indicate an error; far less, what kind of error.
However, attempting to execute you alternate code (under w2k with A::Z v130 on perl 5.8 with strict and warnings; do you use these pragmas?) does pop up a message:
Can't locate object method "extractToFileNames" via package "Archive::Zip::ZipFileMember" at 848152.pl line 13.
And the meaning of your reply re suspecting the module boils down to interpretation of the wink? Perhaps you should consult the author, Adam Kennedy... but with better evidence than a wink.
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: unable to extract same file twice using Archive::Zip
by Khen1950fx (Canon) on Jul 06, 2010 at 08:33 UTC
|
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
Re: unable to extract same file twice using Archive::Zip
by Pickwick (Beadle) on Jul 07, 2010 at 10:19 UTC
|
I've recreated a test program which shows me error messages on stdout and unzipping more than once really seems to not work for me. I have ActiveState Perl 5.10.1.1007 32 Bit on Win Server 2003 R2 SP2 and Archive::Zip has Version 1.30.
use strict;
use Archive::Zip qw(:CONSTANTS :ERROR_CODES);
use File::Temp;
my $tmp = $ENV{'TEMP'}.'/zip/extractMember';
my $zip = Archive::Zip->new();
if ($zip->read('test.zip') != AZ_OK)
{
die 'Einlesen klappt nicht.';
}
foreach my $member ($zip->members())
{
print($member->fileName()."\n");
$member->extractToFileNamed(File::Temp::tempnam($tmp, 'XXXXXXXXXXXXX
+XXXXXXXXXXXXXXXXXXX'));
$member->extractToFileNamed(File::Temp::tempnam($tmp, 'XXXXXXXXXXXXX
+XXXXXXXXXXXXXXXXXXX'));
}
Output:
Anlagen.zip
format error: CRC or size mismatch while skipping data descriptor
at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 189
Archive::Zip::ZipFileMember::_skipLocalFileHeader('Archive::Zip::ZipFileMember=HASH(0x1c754dc)') called at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 395
Archive::Zip::ZipFileMember::rewindData('Archive::Zip::ZipFileMember=HASH(0x1c754dc)') called at C:/Programme/Perl/lib/Archive/Zip/Member.pm line 990
Archive::Zip::Member::extractToFileHandle('Archive::Zip::ZipFileMember=HASH(0x1c754dc)', 'IO::File=GLOB(0x1c7b924)') called at C:/Programme/Perl/lib/Archive/Zip/Member.pm line 488
Archive::Zip::Member::extractToFileNamed('Archive::Zip::ZipFileMember=HASH(0x1c754dc)', 'D:\Benutzer\TSCHOE~1\LOKALE~1\Temp\zip\extractMember\DKbzQX0R...') called at D:/Benutzer/tschoening/Eigene Dateien/Eclipse/Perltests/StandAlone/zip/extractMember.pl line 18
Anlagen.zip.pk7
format error: CRC or size mismatch while skipping data descriptor
at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 189
Archive::Zip::ZipFileMember::_skipLocalFileHeader('Archive::Zip::ZipFileMember=HASH(0x1c7586c)') called at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 395
Archive::Zip::ZipFileMember::rewindData('Archive::Zip::ZipFileMember=HASH(0x1c7586c)') called at C:/Programme/Perl/lib/Archive/Zip/Member.pm line 990
Archive::Zip::Member::extractToFileHandle('Archive::Zip::ZipFileMember=HASH(0x1c7586c)', 'IO::File=GLOB(0x1c7b934)') called at C:/Programme/Perl/lib/Archive/Zip/Member.pm line 488
Archive::Zip::Member::extractToFileNamed('Archive::Zip::ZipFileMember=HASH(0x1c7586c)', 'D:\Benutzer\TSCHOE~1\LOKALE~1\Temp\zip\extractMember\hrEJi2dV...') called at D:/Benutzer/tschoening/Eigene Dateien/Eclipse/Perltests/StandAlone/zip/extractMember.pl line 18
Testeinreichung AM-SoFT.pdf.p7s
format error: CRC or size mismatch while skipping data descriptor
at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 189
Archive::Zip::ZipFileMember::_skipLocalFileHeader('Archive::Zip::ZipFileMember=HASH(0x1c75a3c)') called at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 395
Archive::Zip::ZipFileMember::rewindData('Archive::Zip::ZipFileMember=HASH(0x1c75a3c)') called at C:/Programme/Perl/lib/Archive/Zip/Member.pm line 990
Archive::Zip::Member::extractToFileHandle('Archive::Zip::ZipFileMember=HASH(0x1c75a3c)', 'IO::File=GLOB(0x1c7bc04)') called at C:/Programme/Perl/lib/Archive/Zip/Member.pm line 488
Archive::Zip::Member::extractToFileNamed('Archive::Zip::ZipFileMember=HASH(0x1c75a3c)', 'D:\Benutzer\TSCHOE~1\LOKALE~1\Temp\zip\extractMember\za1J__f2...') called at D:/Benutzer/tschoening/Eigene Dateien/Eclipse/Perltests/StandAlone/zip/extractMember.pl line 18
Testeinreichung AM-SoFT.pdf
format error: CRC or size mismatch while skipping data descriptor
at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 189
Archive::Zip::ZipFileMember::_skipLocalFileHeader('Archive::Zip::ZipFileMember=HASH(0x1c75c0c)') called at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 395
Archive::Zip::ZipFileMember::rewindData('Archive::Zip::ZipFileMember=HASH(0x1c75c0c)') called at C:/Programme/Perl/lib/Archive/Zip/Member.pm line 990
Archive::Zip::Member::extractToFileHandle('Archive::Zip::ZipFileMember=HASH(0x1c75c0c)', 'IO::File=GLOB(0x1c7b704)') called at C:/Programme/Perl/lib/Archive/Zip/Member.pm line 488
Archive::Zip::Member::extractToFileNamed('Archive::Zip::ZipFileMember=HASH(0x1c75c0c)', 'D:\Benutzer\TSCHOE~1\LOKALE~1\Temp\zip\extractMember\tqEEW0aI...') called at D:/Benutzer/tschoening/Eigene Dateien/Eclipse/Perltests/StandAlone/zip/extractMember.pl line 18
Testeinreichung AM-SoFT_signed.pdf
IO error: reading header signature :
at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 279
Archive::Zip::ZipFileMember::_readDataDescriptor('Archive::Zip::ZipFileMember=HASH(0x1887acc)') called at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 186
Archive::Zip::ZipFileMember::_skipLocalFileHeader('Archive::Zip::ZipFileMember=HASH(0x1887acc)') called at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 395
Archive::Zip::ZipFileMember::rewindData('Archive::Zip::ZipFileMember=HASH(0x1887acc)') called at C:/Programme/Perl/lib/Archive/Zip/Member.pm line 990
Archive::Zip::Member::extractToFileHandle('Archive::Zip::ZipFileMember=HASH(0x1887acc)', 'IO::File=GLOB(0x1c7baa4)') called at C:/Programme/Perl/lib/Archive/Zip/Member.pm line 488
Archive::Zip::Member::extractToFileNamed('Archive::Zip::ZipFileMember=HASH(0x1887acc)', 'D:\Benutzer\TSCHOE~1\LOKALE~1\Temp\zip\extractMember\Nojn_u7h...') called at D:/Benutzer/tschoening/Eigene Dateien/Eclipse/Perltests/StandAlone/zip/extractMember.pl line 18
| [reply] [Watch: Dir/Any] [d/l] |
|
use strict;
use Archive::Zip qw(:CONSTANTS :ERROR_CODES);
use File::Temp;
warn "Using Archive::Zip $Archive::Zip::VERSION";
my $tmp = $ENV{'TEMP'}.'/zip/extractMember';
my $zip = Archive::Zip->new();
if ($zip->read('tmp2.zip') != AZ_OK)
{
die 'Einlesen klappt nicht.';
}
foreach my $member ($zip->members())
{
print($member->fileName()."\n");
my $target = File::Temp::tempnam($tmp, 'XXXXXXXXXXXXXXXXXXXXXXXXXXXX
+XXXX');
print "First";
$member->extractToFileNamed($target);
print "\n";
print "Second";
$target = File::Temp::tempnam($tmp, 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
+X');
$member->extractToFileNamed($target);
print "\n";
}
__END__
Using Archive::Zip 1.23 at tmp.pl line 5.
tmp2.txt
First
Second
Note that I'm using an older version of Archive::Zip here. I also don't get the warning about the CRC32 mismatch:
format error: CRC or size mismatch while skipping data descriptor
at C:/Programme/Perl/lib/Archive/Zip/ZipFileMember.pm line 189
Are you sure that your archive is OK? I created my test archive using 7zip.
I'll be upgrading to Archive::Zip 1.30 and retest.
Update: Upgraded to Archive::Zip 1.30 and it still works for me:
Using Archive::Zip 1.30 at tmp.pl line 5.
tmp2.txt
First
Second
My guess is that your zip file is broken.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
My guess is that your zip file is broken.
Seems you are right. I extracted the zip contents using WinRar, recreated a new zip and Archive::Zips handles it fine. The funny thing is, that our software created the zip using the same version of Archive::Zip on it's own, before it extracted it using Archive::Zip again. Seems we are doing something wrong while creating the zip, which doesn't affect extracting the files using other zip programs, but does produce trouble in Archive::Zip when extracting more than once.
Thanks!
| [reply] [Watch: Dir/Any] |
Re: unable to extract same file twice using Archive::Zip
by doug (Pilgrim) on Jul 06, 2010 at 17:17 UTC
|
I don't know a thing about Archive::Zip, but if you're having problems extracting the same thing twice into two different places, why not extract the whole archive in to a throw away location (/tmp comes to mind) and just copy what you need to wherever you need it. You can make as many copies as you like. When you're done making copies, just delete the temp location and call it a day.
Yes, this is an inelegant solution, but it should be easy to implement.
- doug
| [reply] [Watch: Dir/Any] |
|
|