Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Deleting Files

by BatGnat (Scribe)
on Feb 23, 2001 at 05:52 UTC ( #60413=perlquestion: print w/replies, xml ) Need Help??

BatGnat has asked for the wisdom of the Perl Monks concerning the following question:

Hey Guys + Girls.
I need a quick way to delete file older than 12 weeks(3 months).
When I say quick I mean in relation to the 450,000+ files in that directory. My own experiences lead me into something like this:
open(DIR,"c:\directory\"); foreach (readdir(DIR)) { Del_File() if (-M > (scaler localtime - TWELVE_WEEKS); } close(DIR);
But as you can notice this could be rather slow, So does anybody know of a quick way of doing this sort of this on a large number of files. I am running W2k

BatGnat

BALLOT: A multiple choice exam, in which all of the answers above are incorrect!

Replies are listed 'Best First'.
Re: Deleting Files
by Adam (Vicar) on Feb 23, 2001 at 06:54 UTC
    How about something like:
    #!perl -w use strict; use File::Find; my $days = 12 * 7; # twelve weeks, seven days a week. sub DeleteOldFiles { return 0 unless -M > $days; unlink $_ or die $!; return 1; } find( \&DeleteOldFiles, '.' );
    Although I think that also sweeps sub directories too.

    Update
    Hmmm, File::Find is really for delving into the subdirs, and if you don't want that, then just glob(*) like this:

    #!perl -w use strict; my $days = 12 * 7; # twelve weeks, seven days a week. while( <*> ) { next if -d; unlink $_ or die $! if -M > $days; }
    By the way, -M returns days since last modified, -A returns days since last accessed. I wasn't sure what your purpose was, but recently accessed files might be useful.

    As for speed issues, you have to loop over everyfile, no matter what, the only question is the efficiancy of the loop. Well, do as little as possible in the loop. Calculate the age aforehand, and short circuit where you can.

      Adam writes:

      Hmmm, File::Find is really for delving into the subdirs,

      I know, and it's difficult to grok sometimes how to use it correctly - the whole "prune/not prune" thing is still unclear to me. But besides that...

      and if you don't want that, then just glob(*) like this:

      For a while now I've been pondering which was faster, a glob or a readdir, so I decided to test it on a big directory I've got lying around (36K files).

      The answer?

      glob failed (child exited with status 1) at trial.pl line 12.

      Nuts. Do I recall somewhere that Perl relies on the shell for globbing in some way? If so, going with a readdir on a big directory may be your only choice.

      As for speed issues, you have to loop over everyfile, no matter what,

      And, although I can't speak for NTFS/HPFS/FAT filesystems, I know that a flat directory on Solaris or Linux is going to be a serious dog to scan over 1000 files, no matter what.

      Implement a heirarchical subdirectory scheme - maybe based on the date, which would simplify purging, too.

      That's what I did, faced with a similar problem, anyway. :-)

      Peace,
      -McD

        With Perl 5.6 globbing no longer uses the shell.

        As for the filesystem, ReiserFS on Linux is supposed to handle that kind of directly smoothly. However ext will slow to a crawl, and NTFS appears to as well.

        The problem, of course, is that every mention of a file requires scanning the list of things in the directory, which means that you scan a list of many thousands of files many thousands of times. Unless the filesystem is designed for that, you have a problem.

        Recommended solutions? Hierarchical structures (which is what most filesystems are designed to do), a dbm, a

      What it actually is, is that the 450,000+ files are fax images, it used to be cleaned once a week when it ran on OS/2. When they converted the system to NT, they failed to implement a purge.
      We now need to rectify that.

      BatGnat

      BALLOT: A multiple choice exam, in which all of the answers above are incorrect!
        Get an NT find and:
        find /image_dir -c +7d -exec rm {} \;

        a

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://60413]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2022-09-28 16:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer my indexes to start at:




    Results (124 votes). Check out past polls.

    Notices?