http://www.perlmonks.org?node_id=1015070

jagexCoder has asked for the wisdom of the Perl Monks concerning the following question:

Hi We have a server that creates daily backups, I wrote a script that successfully deletes backups that are older than 24 hours. However sometimes the backups don't succeed therefore a fresh 24hr backup is not available, in which case we use the most recent backup and discard the older ones. Any ideas on if a backup has not been created in past 24hrs that the most recent backup is kept and the rest are deleted? All advice appreciated, also the backups are of the style mentioned above hence I wish to use regular expressions to detect all files of that naming style.
  • Comment on Delete all but the most recent backup file

Replies are listed 'Best First'.
Re: Delete all but the most recent backup file
by Kenosis (Priest) on Jan 24, 2013 at 05:54 UTC

    If I'm understanding you correctly, perhaps the following will be helpful:

    use strict; use warnings; chomp( my @fileNames = <DATA> ); my @sortedFileNames = map $_->[0], sort { $b->[1] <=> $a->[1] } map { my ( $d, $m, $y ) = /(\d+)/g; [ $_, "$y$m$d" ] } grep /^backup_\d\d_\d\d_\d{4}.bak$/, @fileNames; shift @sortedFileNames; if (@sortedFileNames) { print "$_\n" for @sortedFileNames; #unlink @sortedFileNames; } __DATA__ backup_21_01_2013.bak file.txt backup_20_01_2013.bak what_is_this.doc backup_24_01_2013.bak never_open_this.docx backup_22_01_2013.bak stuff.ini backup_23_01_2013.bak more_stuff.ini deleteOldBackups.pl

    Output (the files that would be deleted):

    backup_23_01_2013.bak backup_22_01_2013.bak backup_21_01_2013.bak backup_20_01_2013.bak

    If you populate @fileNames with the file names in the directory where the backups live, it will grep them only allowing backup-patterned files through. Then, using a Schwartzian transform, it sorts the backup file names in decending order and shifts off the first element (most recent backup file name) from @sortedFileNames. As it is now, the file names in @sortedFileNames are printed, but the unlink line can be uncommented, so all but the most recent backup files are deleted.

    ** Please thoroughly test and verify this on a copy of the backup directory before going live with it. **

      Hi there! So I did some reading and have some queries:

      (1) Why is the Schwartzian transform read from bottom to top? Since in procedural programming it's usually top to bottom. Reading the Wikipedia article it makes sense however curious as to the behavior of this method.


      (2)Could you please explain the expression used in map{}? { my $stat = stat $_; [ $_, $stat->mtime ] } I understand that a scalar variable called $stat has been defined to the 'default variable' $_. My understanding is that when map() is run (the map on the bottom) it evaluates the expression within {} for each element in @fileNames and stores it in the default variable? So would this mean that each time an element is passed the default variable changes?

      (3) Further to (2), I understand that "$stat->mtime" is getting the last modified time since epoch for each value of $stat, my understanding is that each time the next element from the array is passed the mtime for that particular $stat is obtained. So what's the meaning of  [ $_, $stat->mtime ]. Since there's a comma separating the two.

      (4) The semicolon in { my $stat = stat $_; [ $_, $stat->mtime ] } is separating the two statements within this single expression. Is that correct?

      (5) I understand that in sort { $b->[1] <=> $a->[1] } a descending numeric sort is being performed. However what I don't get is the [1] $b and $a both share. Also what's the relationship between $b and $a? I found an example of this type of sort on the net however it did not explain why $b and $a are used. Do they simply represent two different locations in a list?

      (6)The  map $_->[0] does not appear to follow the format map({expression}, list). How is this different to the standard map function?

      Thanks for your help! Sorry for these questions just clarifying my doubts.
      Thanks it certainly is useful and does perform deletion. However I have two queries: (1) Komodo Edit reports:
      Name "main::DATA" used only once: possible typo at bk_remove.pl line 7 +3. readline() on unopened filehandle DATA at bk_remove.pl line 73.
      This is referring to <DATA> in the code. (2) It appears the code does the deletion based on the date listed in the filename, while this is ideal I've done the previous coding based on the day modified only (using -M) and not the date on the filename so I would like to keep it this way. Any ideas on how I could modify this to the way I did it? Much appreciated for all answers, apologies I am not very good at perl however I do get the odd scripts here and there done when needed. Thanks again!

        You're most welcome!

        Yes, Komodo appears to just be alerting you about <DATA>, but should certainly know better, since there's a __DATA__ section.

        Try the following:

        use strict; use warnings; use File::stat; chomp( my @fileNames = <*.bak> ); my @sortedFileNames = map $_->[0], sort { $b->[1] <=> $a->[1] } map { my $stat = stat $_; [ $_, $stat->mtime ] } grep /^backup_\d\d_\d\d_\d{4}.bak$/, @fileNames; shift @sortedFileNames; if (@sortedFileNames) { print "$_\n" for @sortedFileNames; #unlink @sortedFileNames; }

        This stats each file for the modification time, using it in the sort. Also, note that a file glob's used to read directory files...

        I guess you'd have to replace the line
        map { my ( $d, $m, $y ) = /(\d+)/g; [ $_, "$y$m$d" ] }
        by something like
        map {[$_, -M $_]}
        However, currently I'm not often using these functions, so of course you should test it first, not that you end up keeping the oldest instead of the newest backup...
Re: Delete all but the most recent backup file
by vinoth.ree (Monsignor) on Jan 24, 2013 at 06:01 UTC
    Any ideas on if a backup has not been created in past 24hrs that the most recent backup is kept and the rest are deleted?

    So you need to check, whether the backup file created within the 24 hours first, if so, then delete all the backup files which are older than 24 hours

    To find the files created withing 24 hours try this,

    if (1 > -M $filename) { # Your code goes here }

    If any files found then do your normal delete (deletes backups that are older than 24 hours) or skip the deletion part.

Re: Delete all but the most recent backup file
by kielstirling (Scribe) on Jan 24, 2013 at 04:44 UTC
    Hey, Not sure if I understand the question ... but, this will give you the last modified file.
    #!/usr/bin/perl use Modern::Perl; use IO::Dir; my $path = shift; my $dir = IO::Dir->new($path); my %files; for my $file (grep !/^\./, $dir->read) { $files{$file} = (stat("$path/$file"))[9]; } my $file = (sort {$files{$a} <=> $files{$b}} keys %files)[0]; say "$file ", scalar localtime($files{$file}); undef $dir;