Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Automatic zip and delete

by justin423 (Acolyte)
on Sep 02, 2021 at 17:32 UTC ( #11136374=perlquestion: print w/replies, xml ) Need Help??

justin423 has asked for the wisdom of the Perl Monks concerning the following question:

is there a script that will zip all files in a drive with a predictable filename, then delete the original?
all files end in FILENAME.GxxxxV00 with the 4 X's as numbers.
Is there a little utility that will scan the entire drive for those files and then zip them up with the number retained and the V00 converted to the file extension .zip e.g. and then delete the original file to create space on the drive.
this seems like something that Perl was built for.

Replies are listed 'Best First'.
Re: Automatic zip and delete
by AnomalousMonk (Bishop) on Sep 02, 2021 at 19:57 UTC
    ... a little utility ...

    In my experience, you can do this with most file-zip utilities, e.g., winzip and 7-zip. It's called "moving" the file from source to destination. I haven't checked (your job), but I think pretty much all such utilities can do something like this.

    Give a man a fish:  <%-{-{-{-<

      ISTR that 7-zip explicitly refrained from implementing a "move" command because of the great potential of losing files by corrupting the archive after having "moved" the file. However, I don't find the relevant discussion anymore :-(
      I posted this because I was sure that this type of utility had to have been written before, since this functionality would be extremely useful and I was surprised to find out that it wasn't included in 7-zip.
Re: Automatic zip and delete
by Marshall (Canon) on Sep 06, 2021 at 03:08 UTC
    When I first read your question, I thought that your intent to "create space" meant that once a huge .zip file had been created, it would be archived somewhere else and a whole mess of files all over the place would be deleted. Upon second reading, it appears that you just intend to do a one-to-one replacement of each original file with zipped version of the original file.

    7-zip is a good choice. From my experience on Windows, 7-zip is faster and generates a smaller file (compresses better) than the built-in zip functionality while still being 100% compatible with the standard unzipper.

    I would prioritize data reliability over any speed considerations. I would suggest using File::Find or one of its cousins to create an array with the full path names of all of the work to be done. Of course mileage does vary, but in general, I strive to have the "wanted() routine" do as little actual work as possible. The find() routine will be changing directories as it goes about its work and if some calculation causes an error, then you could be left with the current working directory at some random point. Making a "to do list" doesn't add much computational effort at all. This has the added bonus of being able to calculate progress in terms of "% completed" as you go along. Sometimes long executing time programs get killed by a user because they think that it is "hung" when in fact it is not.

    I would not do this unless there is some compelling reason to do so, but the "to do list" could be a work queue that is parceled out to N threads running in parallel.

    Do not delete the original file until you are certain that the zip file has been properly created. This rules out any kind of "move" to zip file functionality.

    You should run some tests to estimate how much space you will save by doing this. The compression ratio is highly data dependent. I suspect that more space could be saving by deleting or archiving completely obsolete files? Disks are relatively cheap.

    Have you considered what would happen if one if these files happened to be in use by another program at the time? There are many "yeah, buts" and "what if's" down that rabbit hole. I am just mentioning it as "something to consider".

      It is a local copy of a file that is archived elsewhere. these files are all plain text, so they compress to about 10% of the original size.
      there are dozens of jobs that were written to download these files, extract the relevant data out of them, and then they just leave the original file behind, hence the need for a bulk zip/delete.
      I want to keep some of the original files (1 file out of every 8), as the system they are downloaded from purges them on a schedule, so a small sampling of the files gives a history as to how the data has changed over time.
Re: Automatic zip and delete
by tybalt89 (Prior) on Sep 03, 2021 at 17:41 UTC

    Totally untested and very possibly very dangerous :)

    find . -type f -name '*V00' | perl -nle '/(.*)V00\z/ and system "zip $ $_" && rm $_'
Re: Automatic zip and delete
by LanX (Sage) on Sep 02, 2021 at 17:41 UTC
    > this seems like something that Perl was built for.

    yep, you can do that with Perl.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

Re: Automatic zip and delete
by Anonymous Monk on Sep 03, 2021 at 15:44 UTC

    If you're on linux, you might take a look at logrotate. It's quite powerful and flexible, and I think might do what you're trying to achieve. I use it at $work.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11136374]
Approved by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2021-12-04 22:58 GMT
Find Nodes?
    Voting Booth?
    R or B?

    Results (30 votes). Check out past polls.