Re: How to fix wrongly encoded filenames?

by graff (Chancellor)
on Mar 18, 2014

in reply to How to fix wrongly encoded filenames?

Do you still have the original tar file that came from the unix system? If so, you should be able to open that with Archive::Tar, and get the raw byte strings of the file names. If they really are encoded as iso-8859-1, then it's trivial to decode those strings to utf8 (and if necessary, re-encode them to whatever works on your windows server).

If that's possible, then maybe you want to just delete the first attempt from the windows server and try again using Archive::Zip (instead of 7zip, whatever that is); you can iterate through the tar file, decode the non-ASCII names into perl-internal utf8 (and re-encode for windows if necessary); then create directories and files on the server filesystem as needed to unpack the tar contents.

Who knows, maybe you'll want to decode/recode the file contents while you're at it.

Re^2: How to fix wrongly encoded filenames?
by Anonymous Monk on Mar 18, 2014 at 06:17 UTC
    That is what was tried before. It fails for two reasons: First: Performance, cause you have to handle the files separately. The count of file may be upto 5000 files.

    Second: Size, the TAR-balls which are to handle, are of the size of some Gbytes.

