Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Automatic zip and delete

by Marshall (Canon)
on Sep 06, 2021 at 03:08 UTC ( #11136488=note: print w/replies, xml ) Need Help??


in reply to Automatic zip and delete

When I first read your question, I thought that your intent to "create space" meant that once a huge .zip file had been created, it would be archived somewhere else and a whole mess of files all over the place would be deleted. Upon second reading, it appears that you just intend to do a one-to-one replacement of each original file with zipped version of the original file.

7-zip is a good choice. From my experience on Windows, 7-zip is faster and generates a smaller file (compresses better) than the built-in zip functionality while still being 100% compatible with the standard unzipper.

I would prioritize data reliability over any speed considerations. I would suggest using File::Find or one of its cousins to create an array with the full path names of all of the work to be done. Of course mileage does vary, but in general, I strive to have the "wanted() routine" do as little actual work as possible. The find() routine will be changing directories as it goes about its work and if some calculation causes an error, then you could be left with the current working directory at some random point. Making a "to do list" doesn't add much computational effort at all. This has the added bonus of being able to calculate progress in terms of "% completed" as you go along. Sometimes long executing time programs get killed by a user because they think that it is "hung" when in fact it is not.

I would not do this unless there is some compelling reason to do so, but the "to do list" could be a work queue that is parceled out to N threads running in parallel.

Do not delete the original file until you are certain that the zip file has been properly created. This rules out any kind of "move" to zip file functionality.

You should run some tests to estimate how much space you will save by doing this. The compression ratio is highly data dependent. I suspect that more space could be saving by deleting or archiving completely obsolete files? Disks are relatively cheap.

Have you considered what would happen if one if these files happened to be in use by another program at the time? There are many "yeah, buts" and "what if's" down that rabbit hole. I am just mentioning it as "something to consider".

Replies are listed 'Best First'.
Re^2: Automatic zip and delete
by justin423 (Acolyte) on Sep 13, 2021 at 14:33 UTC
    It is a local copy of a file that is archived elsewhere. these files are all plain text, so they compress to about 10% of the original size.
    there are dozens of jobs that were written to download these files, extract the relevant data out of them, and then they just leave the original file behind, hence the need for a bulk zip/delete.
    I want to keep some of the original files (1 file out of every 8), as the system they are downloaded from purges them on a schedule, so a small sampling of the files gives a history as to how the data has changed over time.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11136488]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (1)
As of 2022-01-18 02:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (52 votes). Check out past polls.

    Notices?