Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Compressing files on an entire disk

by justin423 (Beadle)
on Feb 07, 2023 at 15:59 UTC ( #11150213=perlquestion: print w/replies, xml ) Need Help??

justin423 has asked for the wisdom of the Perl Monks concerning the following question:

Is there a way to use IO:compress to:
1. Scan an entire folder with dozens of subfolders in Windows
2. Compress all files individually that have the same filename ending. (not extension, but filename) e.g. all files that have 00 as the last two characters of the filename. e.g.. xxxxx00 becomes xxxxx00.zip
3. then delete the original?

The example in IO:Compress puts all the files together in one archive, but I want to keep them separate, so they can be uncompressed individually.

Replies are listed 'Best First'.
Re: Compressing files on an entire disk
by hippo (Bishop) on Feb 07, 2023 at 16:29 UTC

    You can use File::Find::Rule (or any other of the similar tree-recursing modules) to create your list of files and then apply whatever per-file process you desire to each of them. That's how I would approach this task, anyway.

    Note that you can extract single files from zip archives should you so wish, so you could still create just one big archive - in case that's an option.


    🦛

      ok, I got the search to print a list of files, but I am getting a syntax error when I try to zip them individually.
      my @files = find( file => 'name' => [ ], size => '', in => $directory ); # @output="@files.zip"; my @status = zip @files => @output or die "zip failed: $ZipError\n";
        for my $src (@files) { my $status = zip $src => "${src}.zip" or die "zip failed: $ZipError\n"; } # And, if you're confident unlink @files;

        You cannot just arbitrarily throw arrays into subroutines which explicitly require scalar arguments and expect it to work.


        🦛

Re: Compressing files on an entire disk
by kcott (Archbishop) on Feb 08, 2023 at 04:16 UTC

    G'day justin423,

    Here's how you could do this in a single pass using only core features. I've included verbose output so you can see what's going on: you may not want some, or any, of this. In production, the $dir and $re values would be supplied via arguments, options, config file, or similar. Also, I've aimed for Perl v5.8; there are various improvements you could make with a more recent version — feel free to ask if you're unsure what those might be (stating, of course, the Perl version you're using).

    #!/usr/bin/env perl use strict; use warnings; use autodie; use constant INDENT_PLUS => ' '; use File::Spec; use IO::Compress::Zip qw{zip $ZipError}; my $dir = '/home/ken/tmp/pm_11150213_zip_dirtree'; my $re = qr{00$}; comp00($dir, $re); sub comp00 { my ($dir, $re, $indent) = @_; $indent = '' unless defined $indent; print "$indent** Dir: '$dir'\n"; opendir(my $dh, $dir); for my $file (grep ! /^(?:\.|\.\.)$/, readdir $dh) { my $path = File::Spec::->catfile($dir, $file); print "$indent* Path: '$path'\n"; if (-f $path) { if ($path =~ $re) { eval { my $zip = "$path.zip"; zip $path, $zip or die $ZipError; print "${indent}Zipped: '$path'\n"; unlink $path; print "${indent}Deleted: '$path'\n"; 1; } or do { my $err_msg = $@; print "${indent}FAILED! $err_msg\n"; }; } else { print "${indent}Skipped: '$path' (doesn't match patter +n '$re')\n"; } } elsif (-d _) { print "${indent}Recurse: '$path' is a directory\n"; comp00($path, $re, $indent . INDENT_PLUS); } else { print "${indent}Skipped: '$path' (not plain file or direct +ory)\n"; } } closedir $dh; return; }

    For testing, I created /home/ken/tmp/pm_11150213_zip_dirtree then ran the following to set up test directories and files. Sorry, I don't know the equivalent MS-DOS commands.

    ken@titan ~/tmp/pm_11150213_zip_dirtree $ rm -rf a d ken@titan ~/tmp/pm_11150213_zip_dirtree $ mkdir -p a/b a/c d/e d/f/g ken@titan ~/tmp/pm_11150213_zip_dirtree $ for dir in a a/b a/c d d/e d/f d/f/g; do > $dir/x00; > $dir/y; > $di +r/z01; done

    Here's the output from a sample run:

    ken@titan ~/tmp/pm_11150213_zip_dirtree $ ./compress00.pl ** Dir: '/home/ken/tmp/pm_11150213_zip_dirtree' * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/.compress00.pl.swp' Skipped: '/home/ken/tmp/pm_11150213_zip_dirtree/.compress00.pl.swp' (d +oesn't match pattern '(?^:00$)') * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/a' Recurse: '/home/ken/tmp/pm_11150213_zip_dirtree/a' is a directory ** Dir: '/home/ken/tmp/pm_11150213_zip_dirtree/a' * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/a/b' Recurse: '/home/ken/tmp/pm_11150213_zip_dirtree/a/b' is a director +y ** Dir: '/home/ken/tmp/pm_11150213_zip_dirtree/a/b' * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/a/b/x00' Zipped: '/home/ken/tmp/pm_11150213_zip_dirtree/a/b/x00' Deleted: '/home/ken/tmp/pm_11150213_zip_dirtree/a/b/x00' * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/a/b/y' Skipped: '/home/ken/tmp/pm_11150213_zip_dirtree/a/b/y' (doesn' +t match pattern '(?^:00$)') * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/a/b/z01' Skipped: '/home/ken/tmp/pm_11150213_zip_dirtree/a/b/z01' (does +n't match pattern '(?^:00$)') * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/a/c' Recurse: '/home/ken/tmp/pm_11150213_zip_dirtree/a/c' is a director +y ** Dir: '/home/ken/tmp/pm_11150213_zip_dirtree/a/c' * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/a/c/x00' Zipped: '/home/ken/tmp/pm_11150213_zip_dirtree/a/c/x00' Deleted: '/home/ken/tmp/pm_11150213_zip_dirtree/a/c/x00' * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/a/c/y' Skipped: '/home/ken/tmp/pm_11150213_zip_dirtree/a/c/y' (doesn' +t match pattern '(?^:00$)') * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/a/c/z01' Skipped: '/home/ken/tmp/pm_11150213_zip_dirtree/a/c/z01' (does +n't match pattern '(?^:00$)') * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/a/x00' Zipped: '/home/ken/tmp/pm_11150213_zip_dirtree/a/x00' Deleted: '/home/ken/tmp/pm_11150213_zip_dirtree/a/x00' * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/a/y' Skipped: '/home/ken/tmp/pm_11150213_zip_dirtree/a/y' (doesn't matc +h pattern '(?^:00$)') * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/a/z01' Skipped: '/home/ken/tmp/pm_11150213_zip_dirtree/a/z01' (doesn't ma +tch pattern '(?^:00$)') * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/compress00.pl' Skipped: '/home/ken/tmp/pm_11150213_zip_dirtree/compress00.pl' (doesn' +t match pattern '(?^:00$)') * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/d' Recurse: '/home/ken/tmp/pm_11150213_zip_dirtree/d' is a directory ** Dir: '/home/ken/tmp/pm_11150213_zip_dirtree/d' * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/d/e' Recurse: '/home/ken/tmp/pm_11150213_zip_dirtree/d/e' is a director +y ** Dir: '/home/ken/tmp/pm_11150213_zip_dirtree/d/e' * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/d/e/x00' Zipped: '/home/ken/tmp/pm_11150213_zip_dirtree/d/e/x00' Deleted: '/home/ken/tmp/pm_11150213_zip_dirtree/d/e/x00' * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/d/e/y' Skipped: '/home/ken/tmp/pm_11150213_zip_dirtree/d/e/y' (doesn' +t match pattern '(?^:00$)') * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/d/e/z01' Skipped: '/home/ken/tmp/pm_11150213_zip_dirtree/d/e/z01' (does +n't match pattern '(?^:00$)') * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/d/f' Recurse: '/home/ken/tmp/pm_11150213_zip_dirtree/d/f' is a director +y ** Dir: '/home/ken/tmp/pm_11150213_zip_dirtree/d/f' * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/d/f/g' Recurse: '/home/ken/tmp/pm_11150213_zip_dirtree/d/f/g' is a di +rectory ** Dir: '/home/ken/tmp/pm_11150213_zip_dirtree/d/f/g' * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/d/f/g/x00' Zipped: '/home/ken/tmp/pm_11150213_zip_dirtree/d/f/g/x00' Deleted: '/home/ken/tmp/pm_11150213_zip_dirtree/d/f/g/x00' * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/d/f/g/y' Skipped: '/home/ken/tmp/pm_11150213_zip_dirtree/d/f/g/y' ( +doesn't match pattern '(?^:00$)') * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/d/f/g/z01' Skipped: '/home/ken/tmp/pm_11150213_zip_dirtree/d/f/g/z01' + (doesn't match pattern '(?^:00$)') * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/d/f/x00' Zipped: '/home/ken/tmp/pm_11150213_zip_dirtree/d/f/x00' Deleted: '/home/ken/tmp/pm_11150213_zip_dirtree/d/f/x00' * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/d/f/y' Skipped: '/home/ken/tmp/pm_11150213_zip_dirtree/d/f/y' (doesn' +t match pattern '(?^:00$)') * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/d/f/z01' Skipped: '/home/ken/tmp/pm_11150213_zip_dirtree/d/f/z01' (does +n't match pattern '(?^:00$)') * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/d/x00' Zipped: '/home/ken/tmp/pm_11150213_zip_dirtree/d/x00' Deleted: '/home/ken/tmp/pm_11150213_zip_dirtree/d/x00' * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/d/y' Skipped: '/home/ken/tmp/pm_11150213_zip_dirtree/d/y' (doesn't matc +h pattern '(?^:00$)') * Path: '/home/ken/tmp/pm_11150213_zip_dirtree/d/z01' Skipped: '/home/ken/tmp/pm_11150213_zip_dirtree/d/z01' (doesn't ma +tch pattern '(?^:00$)')

    — Ken

Re: Compressing files on an entire disk
by tybalt89 (Monsignor) on Feb 17, 2023 at 19:21 UTC

    Simple non-recursive way.
    (I don't have Windows to test on, but this runs on ArchLinux.)

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11150213 use warnings; use IO::Compress::Zip qw{zip $ZipError}; my @paths = '.'; # FIXME set to the base of your folder while( defined( my $path = pop @paths) ) { if( -d $path ) { opendir my $dh, $path or die "dir $path errro $1 on opendir"; push @paths, map "$path/$_", grep !/^\.\.?\z/, readdir $dh; } elsif( -f $path and $path =~ /00\z/ ) { print "found to zip: $path\n"; # (zip $path, "$path.zip") ? unlink $path : die "$ZipError on $path" +; # untested } }

    Of course, uncomment the 'zip' line to have it actually do the zipping, and set @paths to the appropriate directory (or directories).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11150213]
Approved by marto
Front-paged by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (1)
As of 2023-04-02 09:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?