Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

How do I find and delete files based on age?

by macvsog (Novice)
on Feb 26, 2007 at 14:30 UTC ( #602128=perlquestion: print w/ replies, xml ) Need Help??
macvsog has asked for the wisdom of the Perl Monks concerning the following question:

I'm just starting to try and learn Perl, but it's a sort of trial by fire situation. I need to create a script right away, to resolve a problem. Were I doing this in the Windows environment instead of Unix, I'd already have the script done and compiled (because in Windows, I use the Winbatch scripting language). But I'm still learning shell scripting, let alone Perl. So I could use a little help. My question is one as two whether or not Perl has the necessary functions for me to be able to create the script in perl. I've been through the built-in function list and I don't see much in the way of file manipulation functions or date/time functions. I need to create a perl script (or a shell script, if it's possible to do it that way, using the tcsh shell) that will examine the contents of the /var/backups/repository directory, find all files and directories that have a name that begins with the characters DATE_ , determine which ones of those it finds are more than two weeks old, and delete the ones that are more than two weeks old. If they are directories, it would first have to delete all the files in that directory and then delete the directory. Can anyone point me in the right direction?

Comment on How do I find and delete files based on age?
Re: How do I find and delete files based on age?
by philcrow (Priest) on Feb 26, 2007 at 14:43 UTC
    Perl provides a lot of its functionality through external modules like File::Find. Some of them ship with Perl, others are available for download from CPAN, like DateTime.

    Those two modules will probably solve your problems. Some people prefer File::Find::Rule over File::Find.

    Phil

      that totally works
Re: How do I find and delete files based on age?
by Fletch (Chancellor) on Feb 26, 2007 at 14:44 UTC

    Familiarity with POSIX and/or C can help in knowing what to look for. You're interested in stat for retrieving file information, or possibly the -M operator (documented in perlfunc as well, type perldoc -f -X at your shell prompt). And File::Find or File::Find::Rule will help you traverse your filesystem to get the victim directories' names. File::Path has routines for blowing away directory trees, but it may be just as simple to shell out to rm -rf ./blah via system.

Re: How do I find and delete files based on age?
by davorg (Chancellor) on Feb 26, 2007 at 14:58 UTC

    (Please use a more descriptive title for your questions)

    In pseudocode, your solution would be something like this:

    • Get date two weeks ago (using time and some basic arithmatic to subtract 14 day's worth of seconds)
    • Open directory handle with opendir
    • For each file read from the directory handle with readdir...
      • Ignore file unless is starts with DATE_ (check with a regular expression)
      • Parse date and time out of filename (using a regex)
      • If parsed date is less than the two week's ago date that you calculated at the start then delete the file
    • Read next file from directory handle
Re: How do I find and delete files based on age?
by andye (Curate) on Feb 26, 2007 at 15:04 UTC
    Hi macvsog,

    You won't have a problem once you get into it, this is bread-and-butter stuff for Perl.

    Things you'll want to take a look at:
    opendir
    readdir
    stat
    time
    unlink
    system
    and as others have pointed out, there's a bunch of modules to help you with this kind of thing too... best of luck with it. HTH!

    update: oh, and maybe substr as well, if you don't want to get into regular expressions quite yet...

Re: How do I find and delete files based on age?
by scorpio17 (Monsignor) on Feb 26, 2007 at 15:11 UTC
    As you know, "there's more than one way to do it...", but this may help get you started:
    #!/usr/bin/perl use strict; use File::Find; if ($ARGV[0] eq "") { $ARGV[0]="."; } my @file_list; find ( sub { my $file = $File::Find::name; if ( -f $file && $file =~ /^DATE_/) { push (@file_list, $file) } }, @ARGV); my $now = time(); # get current time my $AGE = 60*60*24*14; # convert 14 days into seconds for my $file (@file_list) { my @stats = stat($file); if ($now-$stats[9] > $AGE) { # file older than 14 days print "$file\n"; } }
    Assuming you name this script cleanup.pl, you would use it like this:
    cleanup.pl /var/backups/repository
    If you don't specify a directory, it will use whatever the current directory is. Note than the stats function returns an array of info, which I'm saving into the @stats array. Element 9 contains the last modification time, which may be different than the actual creation time (read up on stats so you know which one you want to use).

    Also, this example just prints out the files starting with DATE_ that are 14 days old (or older). Change the print statement to:

    unlink $file;
    to actually delete them. This may leave you with empty directories, but you can write another script to delete empty directories after running this one.
      use strict;

      Why not

      use warnings; # as well?
      if ($ARGV[0] eq "") { $ARGV[0]="."; }

      Later on you say: "If you don't specify a directory, it will use whatever the current directory is." Had you warnings turned on, this would trigger an 'uninitialized' warning. Which is sensible: actually $ARGV[0] would be undefined rather than strictly equal to the empty string. I would use the simpler

      @ARGV = '.' unless @ARGV;

      so that all the directories supplied on the command line would be searched, and a reasonable default would be provided if none is specified. Granted: this is not meant as a harsh critique to your code. I know it is just an example. I only want to expand a little bit on the subject.

      my @file_list; find ( sub { my $file = $File::Find::name; if ( -f $file && $file =~ /^DATE_/) { push (@file_list, $file) } }, @ARGV);

      Two things:

      1. I like to use File::Find's no_chdir mode, so that I wouldn't need $File::Find::name. As of now your code is actually wrong, since find() is changing dir, so that $file which is a path relative to the base dir being searched, will be interpreted relative to the cwd, and -f will most likely fail, except for coincidences;
      2. I used to write such code too, that first collects filenames, and then process them later. If huge volumes of files are to be skimmed through, though, this may make the script seemingly "hang" before it says something interesting. Thus nowadays I avoid doing so, if possible. In this particular case I see no reason why the check on the date couldn't be made in the sub that is supplied to find() in the first place. (Ok, the resulting code wouldn't do exactly the same as yours, the difference being given a few seconds or at most minutes whereas the threshold is measured in days - so I wouldn't regard it as significative.)
      my @stats = stat($file); if ($now-$stats[9] > $AGE) { # file older than 14 days

      I know you probably know, and include an intermediate passage for clarity and instructive purposes, but it is perhaps worth reminding that one can take a list slice as well, and that the temporary @stats variable is not needed:

      if ($now-(stat $file)[9] > $AGE) { # file older than 14 days

      BTW: I am the first one to say one shouldn't care about premature optimization, but stats are known to be expensive, and $file is already statted when it's being searched, so one more reason to do the check at find() time.

Re: How do I find and delete files based on age?
by Moron (Curate) on Feb 26, 2007 at 15:14 UTC

      I would also highlight strongly to any 'Newbies', especially those unfamiliar with *nix environments not to run this one liner before making backups.

      Because it deletes things. And things that delete other things should always be tested first

      -=( Graq )=-

        Surely the backup is more likely to go wrong than the one liner! Instead a newbie should only be working unsupervised on a non-production machine and backups should be done daily (and in this case also on demand) by properly-qualified staff for all machines including non-production.

        Moreover, Anybody, not just newbies, can make a mistake and it is rather comforting to have the capability to call your friendly sysadmin to restore the damaged goods back to a previous state if there's no quicker repair available.

        I remember having a misunderstanding in a Q/A system over the first digit in an identifier once and deleting everything BUT the data I was supposed to be deleting. Fortunately, a phone call and ten minutes later it was back to where it was.

        -M

        Free your mind

      Touching on what was replied to you, I'd suggest replacing "rm -rf $_" with "echo rm -rf $_" At least then you can audit it until it works perfectly. ^^
        Yes, that I do agree with. And "print" if its not a shell-out. I also quite often put an echo in front of the perl -e (update: for long or multiple lines being typed in) to check that I typed what I think I typed before actually running it.

        -M

        Free your mind

        I don't count myself as a newbie, but I always do a test like you suggest (print plain output) before unleashing a deletion script on my system--even if it's just a script to clean out a temp directory.

        It only takes a moment to make sure that you will be deleting what you think you will be deleting, and it is easily worth the time. How long will it take you to restore your files from backup? You have do backups, right?


        TGI says moo

Re: How do I find and delete files based on age?
by Anonymous Monk on Feb 26, 2007 at 19:32 UTC
    Look into the File::Glob module to get your file list (perldoc File::Glob).
    You can use regular expressions to look for filename matches (perldoc perlre), it'll look something like:
    if ($filename ~= m/DATE_/)
    For the date comparison, are you going to use a system date, or is there some sort of naming convention that integrates the date into the file name? If the latter, regular expressions are your friend again.
Re: How do I find and delete files based on age?
by duckyd (Hermit) on Feb 26, 2007 at 20:44 UTC
    If your task really is as simple as you describe (and you don't anticipate it becoming more complicated later on) then there's no reason not to just use find:

    find ./ -name 'DATE_*' -mtime +14 -exec rm -rf {} \;

    backup first, test before you run (w/o the -exec rm -rf {} \;) to verify it finds the right fields, etc, etc...

      I have never used it but I believe find2perl will convert the above command into pure perl. Just another option :)
      The use of a relative path is a good thing, but this is incomplete. Paranoia should take control of when you use a destructive command and you should never make assumptions.

      Here is a simplified example where the tests are inadequate:

      cd /targetdir/targetsubdir rm -fr *

      Imagine if the target directory was not mounted, or your chdir failed for whatever reason (e.g. inadequate permissions). Yes, you are likely now listening to the whirr of you hard-disk working feverishly to delete everything from the directory you were in prior to the failed cd, and I have personally witnessed cases of that particular directory being / .

      The safe approach is:

      > cd $TARGETDIR && rm -fr ./targetsubdir or > test -d $TARGETDIR && find . -name 'DATE_*' -type f -mtime +14 -exec + rm -fr {} \;

      Niel

Re: How do I find and delete files based on age?
by talexb (Canon) on Feb 27, 2007 at 04:22 UTC

    You're developing for Linux/Unix, right? I'm surprised that no one's mentioned tmpwatch yet. It's a tool that specifically written to get rid of files older than a particular age.

    True, there's no Perl involved -- but sometimes the best answer is to not use Perl at all.

    Update: .. And here's a link to the first page that Google found for tmpwatch. You can also find information about it by typing man tmpwatch on your Unix/Linux system.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

      WARNING: The following text is much like the ramblings of an old man, etc ...

      Ahhh ... the ol' "remove old files" script. Had to do a few of these,
      over the years, at numerous companies. It is usually as a result of the
      file system filling up with "old" log files, etc (eg log001, log002, etc).

      The easiest thing to do back then was to run the one-liner Unix "find" command
      with the "-mtime" followed by the "-exec" or "-delete" flag
      (see http://unixhelp.ed.ac.uk/CGI/man-cgi?find). Just an option to consider ...

      ... Anyway the important thing here, however, was that in the early days I made a complete mess
      of things when I didn't TEST the script first.

      Nowadays I have a scheduled task that archives/zips the old (1 week) files
      to another directory (much like a trashcan). Another task removes or deletes
      these files from the archived directory sometime later (if they are 2 or
      more weeks old). Of course, you can manually delete the files at any time
      knowing that they have been backed-up on tape by the system administrators - right?

      Some tips (hopefully it's not too late ...)
      In your script (perl or otherwise):

      * Have an option to only display the files to be deleted - or to display before deletion (with a confirmation)

      * Have an option to archive rather than delete old files (eg move to another directory and gzip)

      * Have an option to "restore" files from the archive (ie an "undelete")

      * As you become more confident, allow handling of files via "regular expression" - handly for file names containing unsual characters or spaces, etc.

      * Perhaps you can consider searching for files based on file attributes such as file sizes and (modified) dates (use ranges) and file types

      * Log what has been archived (or restored) or deleted, the time and the *user id*

      * If this is your first script - ever - and you are basically performing a "rm *.* www.*" then for goodness sake do not put your name on the script!

      - Laz.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://602128]
Approved by andye
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (12)
As of 2014-12-19 09:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (75 votes), past polls