in reply to Seeking help with Extracting files from zip

Everything works perfectly as long as my zip file has files named with Latin characters, but things get worse when the names are Chinese or Japanese.

If you can answer a couple of questions, it may give us the information that would allow us to actually help you...

Oh, one other thing: the following code works for me (Perl 5.10.1, debian oldstable amd64):

nathan@warthog:~/test2/extract$ ls nathan@warthog:~/test2/extract$ perl -e ' $filename = ""; $dest_dir = "/home/nathan/test2/extract"; use Archive::Zip; my $zip = Archive::Zip->new(); local $Archive::Zip::UNICODE = 1; unless ( $zip->read($filename) == AZ_OK ) { die "Error Reading Zip File !"; } foreach my $m ($zip->members()) { print "Member $m:\n "; my $err = $zip->extractMemberWithoutPaths( $m, "$dest_dir/" . $m->fi +leName); print "Error: $err" if $err; print $/; }' Member Archive::Zip::ZipFileMember=HASH(0xdfdd30): Member Archive::Zip::ZipFileMember=HASH(0xdfe2b8): Member Archive::Zip::ZipFileMember=HASH(0xdfe5a0): Member Archive::Zip::ZipFileMember=HASH(0xdfe888): Member Archive::Zip::ZipFileMember=HASH(0xdfeb98): Member Archive::Zip::ZipFileMember=HASH(0xdfee80): nathan@warthog:~/test2/extract$ ls 한국어 ગુજર&# +2750;તી ಕನ್ನಡ ব&#24 +94;ংলা 中文 日本語 nathan@warthog:~/test2/extract$
(Perlmonks seems unable or perhaps unwilling to handle most of those characters -- and if unwilling I can't blame them; this is by design an English-language venue -- but they display just fine on my terminal when I do the ls. Of course, I created my using the zip program that comes with Debian; yours may have been created using different software...)

Replies are listed 'Best First'.
Re^2: Seeking help with Extracting files from zip
by aksjain (Acolyte) on Jan 14, 2015 at 14:37 UTC

    Thanks for your reply. By worse i mean the filename characters gets mangled. I tried extracting the same zip file using windows tools like winrar and it extracts the files with proper names likewise it should be. I am using windows 7 and have Japanese and Chinese language packs installed on the machine. Below is the link to an image which shows the difference in name of the folder.

      Ok, so I assume the katakana filename there is what it's supposed to look like, and the gibberish filename with nearly more than twice as many characters, most of which look like they came from the miscellanous-symbols-and-accented-characters section of an eight-bit character set, is the result of running your code?

      This definitely looks like a charset translation issue. The Archive::Zip documentation indicates that setting UNICODE causes the filenames in the archive to be treated as UTF8. Perhaps they're not? Maybe they're UTF16 or UTF32 or some other Unicode encoding (or, heaven help you, some pre-Unicode Asian encoding like Shift-JIS or whatnot)? If you can figure out what fiddling needs to be done to preserve the encoding, you can pass the correct filename to extractMemberWithoutPaths and that should probably work, I think...

      Unfortunately, I don't know that much about the details of the character sets involved, but maybe someone else will come along now and be able to recognize what's going on. (Even just being able to recognize which encoding is being erroneously treated as though it were some other encoding would go a long way toward figuring out the problem.) That image you provided should help.

Re^2: Seeking help with Extracting files from zip
by aksjain (Acolyte) on Jan 19, 2015 at 11:25 UTC

    Thanks a lot jonadab. The solution you suggested just worked seemlessly. Thanks a lot.

    Can you please help me with another question asked on ??