Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Problems with Archive::Tar?

by Masem (Monsignor)
on Mar 18, 2002 at 14:40 UTC ( [id://152482]=perlquestion: print w/replies, xml ) Need Help??

Masem has asked for the wisdom of the Perl Monks concerning the following question:

I tried to do a quicky-but-useful project this weekend in the form of Tie::Hash::Tar, a way to access tar file contents via the hash mechanism. In this situation, I'm using the object version of the Archive::Tar, as opposed to the class methods. I found that generally there were no problems with this, until I tried to open an existing tar file (w or w/o compression) and add new files to it. I found that after new() and read() calls on the object, the object would be aware of all data files in the archive, but would have those contents set as undef by default. Unless these were modified, then on write(), those files with undef contents would NOT be written into the new tar file. Now, the way I read the docs (and only have done a cursory glance at the code at this point), I would believe that the Archive::Tar module only grabs file content when needed as to cut down memory usage. Either I misread this, and A:T is supposed to bring in all the content to start (which it isn't), or I'm correct, and A:T is mistakening omitting those entries that haven't been modified but haven't been read in yet. In either case, I've found that forcing a get_content() on every file in the archive fixes this case, but this is certainly not memory_friendly.

For note, this is with p5.6.1, on a Linux i386 box, with the latest version of A:T and support modules from CPAN.

I'm going to try to code dive at some point to try to find what's supposed to happen, but before I do, I wonder if anyone else has seen this problem with A:T. The ngs are suprisingly quiet on this. In addition, even I do find a problem, and a way to fix it, both discussion here and on ngs suggest that A:T is no longer supported by it's author. How would I go able getting it patched or the like (assuming that contacting the author doesn't help?)

-----------------------------------------------------
Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
"I can see my house from here!"
It's not what you know, but knowing how to find it if you don't know that's important

Replies are listed 'Best First'.
Re: Problems with Archive::Tar?
by rjray (Chaplain) on Mar 18, 2002 at 23:36 UTC

    From what I can tell by looking at the manual page, there are two interfaces for adding material to an archive: add_files and add_data. My guess is that you are using the add_files function, yes?

    The mechanism of not reading the file data into memory is to conserve memory usage. The LWP::UserAgent class had a similar issue a few years back, in that it would attempt to load all the contents of a file being sent via CGI file-upload into memory before starting the transaction. This became a problem with a 32Meg file I was trying to send using HTTP as a transport layer :-).

    It does sound like what you have run into is a bug, in that I too would have assumed that files added with add_files would be dealt with in-place if the file already exists. I would consider using add_data, or just being in the habit of following add_files with calls to get_content.

    As for the question of support/maintenance, the package has changed hands a few times already. But CPAN itself now makes use of it, so I believe it is safe to assume that it will never be completely orphaned. Updated in a timely fashion? That I cannot offer any guarantees on...

    --rjray

      Actually, I am using add_data (the idea is that with a tied hash, your keys are file names, the values the data of the archive). But now that I've source-dove, I see that as written, Archive::Tar doesn't appear to be designed as to modify existing tar files: it seems to be set to either allow reading from a file, or to create a new tar file from scratch. Looking at the source, line 334, located in the the function that reads the tar file, will read in the data but simply ignore it as to skip through the file quickly. When data is written, it skips files that have no data associated with them, and thus existing files in the archive are ignored. Now, to some extent, this makes sense, since you can't read and write to a file at the same time. But then, there's functions in the API like replace_content that seem to indicate that this package would do so. Again, it's not that this can't handle solely making one or solely reading one, just that doing both, it can't.

      This leaves a couple of options. I can read the entire contents at the start and explicitly set that, but this carries the entire tar file around in memory during execution which may not be great. I could wait until the the tie'd object is destroyed, read the data in and as to avoid stamping over data set during execution then write out, but this still requires the code to be in place at some point. Another solution would be to actually just extract the tar to temp space, map hash calls to file operations, then gather everything up at the end, but this moves the problem to disk space, which is not necessarily great either. I could consider rewriting A:T to suit my needs as well, possible, as jc's suggested offsite, using Inline::C and libtar for tar access. But, IMO, save for the first two, the rest are overkill.

      I think, for the short term, I'll consider a read-only tie'd class (As I was planning on using, it would only be for reading contents from plug-ins, and thus would have no write ability). There's enough issues with a two-way tar file that it may not be best to think in those terms as yet.

      -----------------------------------------------------
      Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
      "I can see my house from here!"
      It's not what you know, but knowing how to find it if you don't know that's important

        It looks as though something similar to a couple of your suggestions has been done with Meta::Archive::MyTar. That may be a possible starting point for your read-only version.

        --traveler

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://152482]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2024-04-19 23:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found