Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Finding oldest file in directory

by Nitrox (Chaplain)
on Oct 18, 2004 at 18:11 UTC ( #400255=perlquestion: print w/ replies, xml ) Need Help??
Nitrox has asked for the wisdom of the Perl Monks concerning the following question:

I have a script that needs to determine the oldest file in a particular directory and I'm concerned about the efficiency of my current solution. (This runs every 60 seconds, which is why I'm concerned with optimization). Here's an example snippet:

my $dir = "."; my $file = (sort{(stat $a)[10] <=> (stat $b)[10]}glob "$dir/*.pl")[0];

This script runs across multiple platforms (Win32, Solaris, Linux and AIX) so limits me from some perhaps "easier" solutions.

Another important piece of info is that the directory is relatively small and has no more than 10 files at any given time, so I wasn't concered about the numerous stat calls.

So is my current solution acceptable, and I'm trying to micro-optimize, or does anyone see a glaring performance issue?

Thanks in advance for feedback!

-Nitrox

Comment on Finding oldest file in directory
Select or Download Code
•Re: Finding oldest file in directory
by merlyn (Sage) on Oct 18, 2004 at 18:25 UTC
    Probably only the teeniest bit faster to run, but a lot easier to type, I would have picked -s $a over (stat $a)[10] and so on.

    Also, above some number of files (depending on your OS efficiency), it'd be faster to cache your stats for the sort.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.


    update: Darn it. I misread [10] as wanting the size, even though there were other clues in the message about wanting the oldest.

    OK, yes, replace -s there with -M.

      "it'd be faster to cache your stats for the sort"

      Depends on what does "oldest" mean, and how files are created, modified and removed from the directory. The catched info might not be correct and useful. It probably just increases the complexity of the program, with 10+ files in the directory, most likely not worth it.

        I think what Randal L. Schwartz was referring to, when he said "cache it for the sort," was to use a very common sort optimization technique called, not coincidentally, the Schwartzian Transform.
        @sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, (-s $_) ] } @unsorted;

        --
        [ e d @ h a l l e y . c c ]

        If the underlying files are changing quick enough that -s isn't going to return the same result you're probably already screwed (and I want to say that some qsort implementations might even core on you)).

      Do you mean '-C' instead of '-s' (file size)?
Re: Finding oldest file in directory
by TomDLux (Vicar) on Oct 18, 2004 at 18:45 UTC

    If it never has more than 10 files, who cares? If it's fast enough, don't worry. if you need more speed, benchmark and profile.

    Except that sorting N values involves N log N to N^2 comparisons, and if each comparison involves 2 stat, at 10 ms each, it does waste system resources. Schwartzian Transform - creating a hash which associates the stat time with the file name - involves only N stats, and would be economical.

    If files are not going to change, keep a list of known files in a hash, and obtain a list of eligible files. If there are any new files, stat only the ones you don't already know about.

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

Re: Finding oldest file in directory
by jdporter (Canon) on Oct 18, 2004 at 18:48 UTC
    my $cmd = $^O =~ /Win32/ ? 'dir /od /b' : 'ls -1rt'; my( $file ) = qx( $cmd ); chomp $file;
Re: Finding oldest file in directory
by pg (Canon) on Oct 18, 2004 at 18:54 UTC

    glob is actually implemented base on File::Glob since 5.6.0, so turn on GLOB_NOSORT might help a bit,as you want your own sort, not its sort any way.

    use File::Glob ':glob'; @list = bsd_glob('*.*', GLOB_NOSORT); print join(',',@list);
Re: Finding oldest file in directory
by Roy Johnson (Monsignor) on Oct 18, 2004 at 19:12 UTC
    Finding the oldest/most recent/minimum/maximum of a list does not require sorting. For a 10-file directory, it's not a big deal, but the right tool for the job is a simple max-finder:
    my $oldest; my $oldtime = 0; for (glob "$dir/*.pl") { my $thistime = -C; if ($thistime > $oldtime) { ($oldest, $oldtime) = ($_, $thistime); } }
    You could do this at kind of the same programming level as you're trying to by using List::Util 'reduce':
    use List::Util 'reduce'; my $file = (reduce {$a->[0] < $b->[0] ? $a : $b} map {[(stat)[10],$_]} glob '*.pl' )->[1];
    That makes for somewhat complicated reading, though, and might be better broken into more steps.

    Update: for posterity: the map above is only useful for reducing the number of times stat is called, from 2*N to N. The overhead of map and storing the values and dereferencing is probably not worth it. It's certainly simpler to say

    my $file = reduce {(stat $a)[10] < (stat $b)[10] ? $a : $b} glob '*.pl';

    Caution: Contents may have been coded under pressure.
Re: Finding oldest file in directory
by bluto (Curate) on Oct 18, 2004 at 19:25 UTC
    I wouldn't use the 'ctime' field (nor '-C' for that matter) since it's implementation varies depending on platform, and doesn't necessarily indicate age of a file's data -- just the file's metadata. You'll probably want to to use mtime (write time) or atime (access: most recent read or write) instead. See "perldoc perlport" and look for ctime.

    I also wouldn't consider optimizing this much since any reasonable OS will cache your stats for you (and you only have a few files). These stats may even stay cached if you are reading them once a minute.

Re: Finding oldest file in directory
by ikegami (Pope) on Oct 18, 2004 at 21:08 UTC

    You mentioned portability was a requirement, so you should use File::Spec to build paths. "/" is not the file seperator on Macs, for example.

    I have a script that needs to determine the oldest file in a particular directory

    If all you're concerned about is which file is the oldest, there's no need to sort:

    sub get_oldest { my ($dir) = @_; my $oldest; my $oldest_time; my $file_spec = File::Spec->catfile($dir, '*.pl'); foreach (glob $file_spec") { my $time = (stat $_)[10]; if (!$oldest_time || $time < $oldest_time) { $oldest = $_; $oldest_time = $time; } } return $oldest; }

    I don't know how efficient glob is. You can get rid of it:

    use DirHandle (); use File::Spec (); sub get_oldest { my ($dir) = @_; my $oldest; my $oldest_time; my $dh = DirHandle->new($dir); while (defined($_ = $dh->read())) { next unless (/\.pl$/i); my $full_path = File::Spec->catfile($dir, $_); my $time = (stat $full_path)[10]; if (!$oldest_time || $time < $oldest_time) { $oldest = $_; $oldest_time = $time; } } return $oldest; }
      Hi, I'm lazy but since I wrote a program called filexer to sync uploads I'll just make a couple of nitpicking suggestions.

      If you are dealing with remote mounting of windows shares do a lot of testing. In particular permissions and you-can't-get-there-from-here took a lot of my time.

      I used a cygwin binary at one point to solve a problem windows wasn't helping me with. It was a while ago and I don't have the code online right now, but I'm pretty sure I used cygwin's touch command.

      Granularity < 1 second might flub it.

      Illegal characters esp. colons, questionmarks, non-western encodings, filename lengths, etc. if you are actually copying across net like I did. Likewise if so, then security issues possibly.

      maybe loading interpreter and scanning program are going to take a while too. Consider running under mod_perl (even if just under cgi emulation) and calling once evvery 10 seconds via a crontab? I don't want to think of any perl program being launched on my system from scratch every 10 seconds.. that is, can you just keep the thing running all the time instead of quitting it after 10 seconds. Much better then I think.

      Have fun!

Re: Finding oldest file in directory
by Eyck (Priest) on Oct 19, 2004 at 12:36 UTC

    Why sort at all?

    The problem is to find the oldest file, that's it.

    just walk the list of files, and compare every one to the 'currently newest', this would be the most efficient solution.

    I'm shocked that people actually suggested Shwartzian Transform and similiarly overgrown solutions to such simple problem.

      Careful. The high-water-mark algorithm is actually slower than sorting to get the highest value, for some small number of items. Think of the few lines of Perl code that would have to be repeatedly executed for each item. Then think of how little work it takes directly in C to sort that list instead.

      Yes, surprising when I first heard it too.

      -- Randal L. Schwartz, Perl hacker
      Be sure to read my standard disclaimer if this is a reply.

        This node was taken out by the NodeReaper on Tue Oct 19 09:28:56 2004 (EST)
        Reason: theorbtwo dupe. Delete this one, not 400505, please.

        For more information on this node visit: this

        Note that the arguments against a perl-based high-water-mark algorithm don't apply to List::Util::max, which, like sort, is written in C. (The overhead of a function call vs the overhead of other opcodes does, however, apply, but that's a very small difference.)

Re: Finding oldest file in directory
by elwarren (Curate) on Oct 19, 2004 at 18:15 UTC
    How about adding a test to see if the directory has changed before examining every file in the dir? If the dir hasn't updated, skip the test.
      Along the same lines, test what was the oldest file last time and if it hasn't changed, it is still the oldest file.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://400255]
Approved by cbraga
Front-paged by cbraga
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2014-12-20 17:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (97 votes), past polls