Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Creating a tarball

by kcott (Chancellor)
on Feb 07, 2019 at 03:08 UTC ( #1229516=perlquestion: print w/replies, xml ) Need Help??
kcott has asked for the wisdom of the Perl Monks concerning the following question:

Given a directory structure which looks something like this (which I've been using for testing):

fred fred/asd fred/derf fred/qwe fred/zxc fred/derf/fgh fred/derf/rty fred/derf/fgh/vbn

I'm looking for a way to implement the following command in Perl:

$ tar zcvf fred.tar.gz fred

I thought the builtin Archive::Tar module was possibly the way to go. I can get this to achieve what I want but it's horribly clunky:

use Archive::Tar; my $tar = Archive::Tar::->new(); $tar->add_files(glob "fred fred/* fred/*/* fred/*/*/*"); $tar->write("fred.tar.gz", COMPRESS_GZIP);

This has additional problems in that the directory structure could be deeper and hidden files (starting with '.') are not captured by any of those glob patterns.

I tried using "fred" and "fred/" as the sole arguments to add_files() but they only pick up the top-level directory, not the rest of the directory structure. I can't see an add_directory() (or similarly named) method. I also tried with the create_archive() class method but had a similar lack of success.

I've also had a hunt around various core and CPAN modules in the Archive:: and IO:: namespaces: none seemed to do what I want.

I could write a recursive routine to capture the entire directory into an array (for add_files(@filenamelist)). That would be my fallback plan but I was hoping there might be a more elegant (less code) way of doing this.

Any ideas on how I might achieve what I want with either Archive::Tar, or some other module, would be appreciated.

— Ken

Replies are listed 'Best First'.
Re: Creating a tarball
by tybalt89 (Vicar) on Feb 07, 2019 at 06:57 UTC

    Untested:

    use Path::Tiny; path('fred')->visit( sub { -f || -d and $tar->add_files($_) }, {recurse => 1} );

    EDIT: Fuller version (still untested):

    use Archive::Tar; my $tar = Archive::Tar::->new(); my @filenamelist; use Path::Tiny; path('fred')->visit( sub { -f || -d and push @filenamelist, "$_" }, {recurse => 1} ); $tar->add_files(@filenamelist); $tar->write("fred.tar.gz", COMPRESS_GZIP);

    SECOND EDIT: Because I like Path::Tiny better than File::Find, although it has its quirks. For example, in the callback, $_ is a Path::Tiny object instead of a plain string, hence the "$_" which "stringifies" it.

      I'd guard that against unreadable files:

      sub { -f || -d and push @filenamelist, "$_" }, => sub { -f || -d and -r and push @filenamelist, "$_" },

      Enjoy, Have FUN! H.Merijn

        Which leads to the question: Should these files be silently ignored, or an error or warning be generated?

        (And code gets more and more complicated :)

        G'day Tux,

        You raise a good point and, in general, my code is littered with such checks. However, in this instance, the directory structure is generated afresh immediately before the tarball creation, and the spec requires all files in the directory structure to be readable. So, while that is something I would normally do, it's not needed in this particular situation. Thanks for mentioning it anyway.

        — Ken

      G'day tybalt89,

      Thanks. That looks like a nice, succinct solution.

      I'm extending legacy code, which already uses Path::Tiny, so that removes any additional, dependency-related issues (e.g. Makefile.PL). I'll test this out tomorrow.

      By the way, thanks for the additional information, e.g. object stringification.

      Update: Added "[Path::Tiny]" to the title.

      — Ken

      Fuller version (still untested)

      I was happy to see your post using Path::Tiny's visit method. This works well for me:

Re: Creating a tarball
by haj (Hermit) on Feb 07, 2019 at 06:44 UTC

    I doubt that the module Archive::Tar can help here. On the other hand, the utility ptar which comes with this module mimicks the tar command rather closely: it is using File::Find to collect files in a given directory. So I guess that your best choice would be to use File::Find instead of rolling your own recursive routine, copying whatever is useful from ptar.

      G'day haj,

      Thanks for your feedback.

      I did come across ptar during my original research but didn't look at its source code: I wasn't aware of its use of File::Find. I'm extending legacy code which already uses Path::Tiny — discussed elsewhere in this thread — so, in the absence of other suggestions, that's probably the direction in which I'm heading. Anyway, thanks for your suggestion: I'll keep it in mind as a possible alternative option.

      — Ken

        Path::Tiny is fine! The ptar utility shouldn't use it because Path::Tiny is not in the "core" Perl distribution (yet), but since you already have it in your code, I'd prefer it, too.
Re: Creating a tarball [SOLUTION]
by kcott (Chancellor) on Feb 07, 2019 at 23:27 UTC

    My thanks to all who provided help and assistance with this. I now have a working test solution which I present below as this may be useful for others.

    I used Path::Tiny, as originally shown by ++tybalt89, to generate the list of files for archiving. I needed to tweak the suggested code in a few places, as follows:

    • Path::Tiny's visit() method "Executes a callback for each child of a directory.". This meant that the parent directory wasn't added to the array for add_files(): easily fixed by preloading that array with the parent directory.
    • For my purposes, the contents of the directory structure was well known, so no filtering — based on file types, permissions or other characteristics — was required. In other situations, some filtering may be required: see ++tybalt89's and ++Tux' examples[1,2] for possible ways to achieve this.
    • I also needed to change directories to get the exact archive I wanted. This may not be necessary in other circumstances. Note the use of the autodie pragma in a limited, lexical scope to handle problems with chdir and provide useful feedback if they occur.
    • Not an actual tweak required by this code, but use 5.016; was added because it mirrors the version I'm coding to for $work$ (and increases confidence that this won't have issues in my production environment). If you're using this code as a template, and not adding anything fancy, replacing that with a simple use strict; would probably be fine.

    Here's the test script. Note that the directory structure under $src_dir is exactly the same as that presented in the OP.

    #!/usr/bin/env perl use 5.016; use warnings; use Archive::Tar; use Cwd; use Path::Tiny; my $src_dir = '/Users/ken/tmp/test_arch/src'; my $zip_dir = '/Users/ken/tmp/test_arch/zip'; my $top_dir = 'fred'; my $zip_name = 'fred.tar.gz'; my $zip_path = path($zip_dir, $zip_name); { use autodie; my $cur_dir = getcwd; chdir $src_dir or die; my @tar_files = ($top_dir); path($top_dir)->visit( sub { push @tar_files, "$_" }, { recurse => 1 } ); my $tar = Archive::Tar::->new(); $tar->add_files(@tar_files); $tar->write($zip_path, COMPRESS_GZIP); chdir $cur_dir or die; }

    A sample run, as well as various checks, are in the spoiler:

    — Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1229516]
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2019-02-22 21:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I use postfix dereferencing ...









    Results (119 votes). Check out past polls.

    Notices?