Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

Problem with Archive::Tar created archives and Winzip

by smahesh (Pilgrim)
on Jul 20, 2007 at 09:11 UTC ( #627736=perlquestion: print w/ replies, xml ) Need Help??
smahesh has asked for the wisdom of the Perl Monks concerning the following question:

Hi fellow monks,

Winzip and Archive::Tar do not play well together. I hit a problem while writing a small utility that among other things recursively 'tar and gzips' a directory. To support systems that do not ship with a built-in 'tar' utility (MS windows), the utility script falls back to using Archive::Tar. I encountered a problem with my code and sought the help of monks on the ChatterBox.

This writeup is intended to:

  1. Ensure other monks using Archive::Tar do not end up repeating the same mistakes.
  2. This is the important part. It highlights the fact that sometimes the problem may not always be in the code per se. You need to keep an open mind when investigating issue - do not prejudge the code.

The original problem:
I was able to successfully create a archive file 'data.tar.gz' but opening the file in WinZip showed that the archive did not preserve the folder/path structure. The archive was basically "flattened" out. I needed to preserve the file structure inside the archive.

My Code:
The following is a stripped down version of the two approaches for creating the archive file:

#!/usr/bin/perl # # Sample test script # use strict; use warnings; use Archive::Tar; use File::Find; $Archive::Tar::DEBUG=1; my $srcDir = 'C:\Temp\data'; ## Attempt 1 - add the files to the archive in the callback code. my $archive = Archive::Tar->new(); find(\&callback1, $srcDir); $archive->write('one.tar.gz', 9); print "---------ONE-------------------\n"; print join("\n", $archive->list_files()); print "---------ONE-------------------\n"; ## Attempt 2 - prepare a list of files and add to the archive in one p +lace my @files = (); # not using $archive->clear() - I want this to be independent of previ +ous # attempt $archive = Archive::Tar->new(); find(\&callback2, $srcDir); $archive->create_archive('two.tar.gz', 9, @files); print "---------TWO-------------------\n"; print join("\n", @files); print "---------TWO-------------------\n"; sub callback1() { $archive->add_files($File::Find::name); } sub callback2() { push(@files, $File::Find::name); }
The output of above:
---------ONE------------------- C:/Temp/data C:/Temp/data/SpaceMonger.exe C:/Temp/data/spacemonger_README.TXT C:/Temp/data/tcpvcon.exe C:/Temp/data/bar C:/Temp/data/bar/Tcpview.exe C:/Temp/data/bar/TCPVIEW.HLP C:/Temp/data/bar/tcpview_README.TXT C:/Temp/data/foo C:/Temp/data/foo/nc.exe C:/Temp/data/foo/nc_license.txt C:/Temp/data/foo/nc_readme.txt---------ONE------------------- ---------TWO------------------- C:\Temp\data C:\Temp\data/SpaceMonger.exe C:\Temp\data/spacemonger_README.TXT C:\Temp\data/tcpvcon.exe C:\Temp\data/bar C:\Temp\data/bar/Tcpview.exe C:\Temp\data/bar/TCPVIEW.HLP C:\Temp\data/bar/tcpview_README.TXT C:\Temp\data/foo C:\Temp\data/foo/nc.exe C:\Temp\data/foo/nc_license.txt C:\Temp\data/foo/nc_readme.txt---------TWO-------------------

The learnings from this experience:
The code is very simple and still it did not work. I was sure I was missing something obvious and trivial. I looked at the code, debugged it, went through the documentation, asked monks on the CB and followed their suggestions. I could not make progress towards resolving the problem until Corion pointed out a possible issue. It seems that Winzip cannot correctly interpret the archives created by Archive::Tar. It will open up the tar file, but not interpret the file paths correctly and show them as "blank". This gives the wrong impression that the archive does not preserve file paths. This undocumented "feature" gives the wrong impression that there is something wrong with the Perl code and misleads the developer. Testing the archive on a Unix machine showed the file structure correctly. The culprit was Winzip in this scenario.

When using Archive::Tar in a windows environment, test the created archive using a non-Winzip utility ( or on a *nix environment. Sometimes you may be looking at the issue in the wrong location. It turned out that the verification step was defective. The code was OK and confirmed to the module documentation. I am not ruling out bugs in Archive::Tar but it was not immediately obvious to me **where** the problem was located. This was a good lesson to learn.

Thanks to Zaxo, Corion, bart and Intrepid on CB for not only helping me out in identifying the problem but also for providing suggestions on how to improve the utility and alternate implementation ideas.


Comment on Problem with Archive::Tar created archives and Winzip
Select or Download Code
Re: Problem with Archive::Tar created archives and Winzip
by syphilis (Canon) on Jul 20, 2007 at 10:27 UTC
    Hi smahesh,

    I think the following, found in the FAQ section of the Archive::Tar documentation, is relevant here:
    I'm using WinZip, or some other non-POSIX client, and files are not being extracted properly! By default, "Archive::Tar" is in a completely POSIX-compatible mo +de, which uses the POSIX-specification of "tar" to store files. For paths greather than 100 characters, this is done using the "POSIX header prefix". Non-POSIX-compatible clients may not support this part of the specification, and may only support the "GNU Extended Header" functionality. To facilitate those clients, you can set t +he $Archive::Tar::DO_NOT_USE_PREFIX variable to "true". See the "GLO +BAL VARIABLES" section for details on this variable.
    Did you try setting the $Archive::Tar::DO_NOT_USE_PREFIX variable to "true" ? Admittedly the FAQ seems to be describing a slightly different issue, but (IIRC) setting  $Archive::Tar::DO_NOT_USE_PREFIX variable to "true" does fix that "flattening" issue you describe.


      Hi syphilis++,

      I tried your suggestion and it works. With the modification, Winzip correctly reports the full paths.

      Now, I am investigating a way of getting relative paths. In my example, I am tarring 'C:\Temp\data' directory. I want only paths under 'data' directory to be displayed - i.e paths relative to the $srcDir attribute.


Re: Problem with Archive::Tar created archives and Winzip
by poqui (Deacon) on Jul 20, 2007 at 14:46 UTC
    I just wanted to say that I am glad to see such a well structured and well written node. This is a great example of what a good node should look like.

    Great job.
Re: Problem with Archive::Tar created archives and Winzip
by archfool (Monk) on Jul 20, 2007 at 14:54 UTC
    This node and its children would make great entries for Categorized Questions and Answers.
Re: Problem with Archive::Tar created archives and Winzip
by demerphq (Chancellor) on Jul 22, 2007 at 14:32 UTC

    Jos broke Archive::Tar when he took it over and POSIXized it. And since he doesnt use Windows and considers Solaris a higher priority platform he refuses to change it back. Over a year ago I posted a patch to AT that would restore its previous gnu'ishness yet it was rejected because his friends told him it would break in their environments. Despite the absence of evidence actually demonstrating the problem he refused to apply the patch on the basis of his friends opinion. (This is all publically available for review if you dont believe me.)

    The core problem is that there are several different flavours of Tar all of which are slightly incompatible with each other in the area of handling long file names.

    The original tar spec support filenames and paths of up to around 100 chars (i forget the exact size of the field). When storing a path longer than this the original spec stipulated you would use the rightmost 100 chars. Then GNU invented a better way, this is that it uses a special tag as the filename that says the filename for the NEXT file is in the data for the special tag. This format has no arbitrary upper limit on the size of the filename and path.

    Unfortunately however POSIX decided to adopt a different and much stupider scheme, that being to add a new 100 character field to store the path. So in this scheme the original field holds just the filename, and the new field holds the path. Of course this means that both filename and path are size restricted to 100 characters each with a combined total limit of 200 chars. It is this scheme that Jos took with Archive::Tar. Except he didn't do it right. POSIX actually recommends that you should use the extra path field ONLY when the combined filename and path is longer than 100 digits. This is to promote maximum portability with non POSIX compliant implementations (such as pretty much EVERY older tar our there). However Jos didnt do this. He just sticks the filename in the filename slot and the path into the path slot and is done.

    Now Winzip doesnt know about the POSIX format (or didnt last time i checked), certainly older versions dont. So it just reads the filename, ignores the newfangled path field and flattens the entire archive down to a single directory.

    My position is that Archive::Tar should be restored to its previous NON POSIX default and that those who really need POSIX formatted archives should have to stipulate so. Apparently because this would inconvenience the very small minority of Perl Solaris users Jos refuses to do so, preferring to inconvenience the thousands of Win32 user along with all those stuck on older machines with tar implementations that are not POSIX compatible.

    If you check the bug reports for Archive::Tar you will see that many of the open bugs are related to this, and that Jos just doesnt care. You can also find my patch to Archive::Tar there which if you apply will make your Archive::Tar work properly again.

    Jos has in the meantime told me he is willing to accept a patch that fixes some of the problem but that he is unwilling to de-POSIX it by default. Since i no longer have any need to manufacture Winzip compatible tar files and have lots of projects on my plate I havent had the time or inclination to do so. I welcome if somebody does. Alternatively somebody could use my patch to produce a Archive::Tar::Functional or something like that which would install into the Archive::Tar namespace and silently fix the matter.

    Regardless I have to say this particular subject annoys me no end. Someone who takes over a core module and who breaks it on a major platform should do eveeything they can to get it working again, whatever their personal views are on what the defaults should be. Jos hasnt done this which IMO is a dereliction of the duty he undertook when he accepted maint of the module.


      Regardless I have to say this particular subject annoys me no end

      I must say that I was pretty amazed (and still am) to find that I can:
      $tar->read('orig.tar', 0); #read orig.tar into memory $tar->write('copy.tar, 0); #write copy.tar from memory
      and end up with a copy.tar that's not identical to orig.tar. (Of course, that doesn't happen if $Archive::Tar::DO_NOT_USE_PREFIX is set.)

      You can also find my patch to Archive::Tar

      Does the application of that patch achieve something that setting $Archive::Tar::DO_NOT_USE_PREFIX fails to achieve ?


        Afair the patch does the following:

        1. Uses the traditional single field for the name when the name of a file and its path will fit into the original 100 char name field. This is the safest option if its available as it means the tar file can be read even by very old tars that know neither the GNU nor the POSIX format. In other words it bypasses the whole POSIX/GNU debate entirely for a vast majority of use.
        2. Adds support for the GNU long file format which is currently unsupported by A::T. That is where a file with a long name is represented by two file records in the tar. The first record has a funky special name that tells tar that the name is in fact embedded in the (variable size) data portion, and the second has a similar label but the data portion has the file contents. This format allows filespecs of arbitrary size, not the braindamaged "we'll give you another 100 characters -- that should be enough" POSIX format.
        3. Changes the default long filename support format to GNU so that it will not produce POSIX file formats without being explicitly asked to. IME tar utilities that grok the GNU format are more common than ones that grok POSIX, although any new version of GNU tar will handle both correctly.

        In fact it is probably the first change that is the most important and useful. IMO its not that common to pack files whose packed path is longer than 100 chars, and as such its preferable to produce a tar which can be read by anything. As Larry has said: "be liberal with what you accept and conservative with what you produce". A::T should follow suit.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://627736]
Approved by Corion
Front-paged by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (8)
As of 2015-03-30 13:42 GMT
Find Nodes?
    Voting Booth?

    When putting a smiley right before a closing parenthesis, do you:

    Results (645 votes), past polls