Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

grab newest file

by hasimir44 (Sexton)
on Jul 06, 2006 at 22:44 UTC ( [id://559687]=perlquestion: print w/replies, xml ) Need Help??

hasimir44 has asked for the wisdom of the Perl Monks concerning the following question:

Anyone have a suggestion on how to do the following without using the shell?
chomp($report = `ls $path/*B1006* |tail -n 1`);
update: I didn't mention that the *B1006* filenames include timestamps. So, the last one alphabetically always happens to be the newest. My *real* mistake is that I titled this "oldest" instead of "newest". Damn. :)

Replies are listed 'Best First'.
Re: grab oldest file
by Tanktalus (Canon) on Jul 06, 2006 at 22:53 UTC

    a) if it works, why fight it? ;-)

    b) what that does does not match what your subject says. That will give you the last file alphabetically, not the oldest file.

    If you want the same functionality:

    $report = (sort glob "$path/*B1006*")[-1];
    (the sort probably isn't needed, but not all filesystems sort their directories automatically)

    If you actually want the oldest file:

    $report = (sort { -M $a <=> -M $b } glob "$path/*B1006*")[-1]
    (The definition of "oldest" is a bit fuzzy since the info you may want may not be present. This uses modification times - probably close enough.)

      Which of these would use the least amount of resources?
      $report = (sort glob "$path/*B1006*")[-1];
      or
      chomp($report = `ls $path/*B1006* |tail -n 1`);

        The first one. However, the reason for "a)" above was not about resources used, it's more about resources available. What is the responsiveness required by your code, and is it meeting that responsiveness requirement?

        One thing about premature optimisation is that many programmers make the mistake of thinking that saving CPU time is important in and of itself. It isn't - if you don't use the CPU time, the CPU probably will sit idle instead. The delta cost in electricity probably isn't going to be noticed, and surely will be less than paying for your time or electricity in posting your question.

        If, however, your script computer is going through this particular piece of code thousands of times per second, where the difference of this overhead is important, well, that becomes another matter. For example, in a heavily-used web server (where "heavily-used" is entirely dependant on the web server and the overall CPU intensity of the CGI code).

        Mind you, if your question was "I'm trying to get this to work on a Windows box and I'm trying to avoid requiring everyone to install the GNU tools", then that would be a very good answer to my question "a" above. In that case, it would no longer work ;-)

        As to why the first one uses less resources: the second one launches three processes, and sets up two pipes. The first one does everything in the current process. Even on unix/linux where fork overhead is relatively small, creating processes still is a fairly large overhead (e.g., doing lots of stuff to the process tables in the kernel, lots of memory management, etc.). The processes? Obviously, /bin/ls and /bin/tail (or wherever they are). But also /bin/sh. And perl creates a pipe from sh to itself, while sh creates a pipe from ls to tail. I'm not sure how much overhead there is in creating pipes - but it has to be more than not creating pipes ;-)

        If you really want to use the least resources: Updated for newest file:
        my $report = do { opendir my $dh, $path or die "open '$path' $!"; my ( $name, $mtime ) = ( '', ~0 ); while ( my $file = readdir $dh ) { stat "$path/$file"; ( $name, $mtime ) = ( $file, -M _ ) if $file =~ /B1006/ and $m +time > -M _; } $name; };
Re: grab oldest file
by Hue-Bond (Priest) on Jul 06, 2006 at 22:55 UTC

    Your code only returns the oldest file if ls sorts by time. Maybe your ls is an alias. Anyway:

    my %stats; for (<*>) { my $mtime = (stat)[9]; ## for atime, use (stat)[8] push @{$stats{$mtime}}, $_; } my $oldest = (sort { $a <=> $b } keys %stats)[0]; print "oldest: ", (join ',', @{$stats{$oldest}}), "\n";

    Here's another version that doesn't eat so much memory:

    my ($timestamp, $files) = ~0; for (<*>) { my $mtime = (stat)[9]; ## for atime, use (stat)[8] if ($mtime < $timestamp) { $timestamp = $mtime; $files = [ $_ ]; } elsif ($mtime == $timestamp) { push @{$files}, $_; } } print "oldest: ", (join ',', @{$files}), "\n";

    I'm thinking of some way of doing it with map. This somewhat golfed version uses GRT for sorting and returns only one file:

    my $oldest = (map { substr $_,10 } sort map { sprintf "%010ld$_", (stat)[9]; } <*>)[0];

    Update: Fixed sort numerically vs ASCIIbetically. Added two more versions.

    --
    David Serrano

Re: grab newest file
by shmem (Chancellor) on Jul 07, 2006 at 06:15 UTC
    I'd do this with glob, -M and sort:
    $report = (sort{-M $a <=> -M $b}<*>)[0];
    While this is short, sorting all files by date only to get at the newest one is a waste, specially for large directories. Also, it does a -M at each comparison in sort.

    Using the Schwartzian Transform

    $report = ( map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [-M $_, $_] } <*>)[0] );
    eliminates multiple -M calls, but uses more memory, since we build an anonymous array for each file. It appears that this linear approach
    $report = do { local *D; opendir(D,"."); my $t = time; my $ret; while(my $f = readdir(D)) { next if $f =~ /^\.\.?$/; my $ft = -M $f; if($ft < $t) { $ret = $f; $t = $ft; } } closedir(D); $ret; } ;
    is a cheap way, and it's faster. benchmarking gives
    Benchmark: timing 1000 iterations of do, golf, st... do: 8 wallclock secs ( 4.00 usr + 4.21 sys = 8.21 CPU) @ 12 +1.80/s (n=1000) golf: 74 wallclock secs (20.15 usr + 53.05 sys = 73.20 CPU) @ 13 +.66/s (n=1000) st: 34 wallclock secs (18.96 usr + 14.01 sys = 32.97 CPU) @ 30 +.33/s (n=1000) Rate golf st do golf 13.7/s -- -55% -89% st 30.3/s 122% -- -75% do 122/s 792% 302% --

    running on my /usr/share/man/man1 directory. Note the big difference in CPU time. -M is 40% faster than (stat)[9], no wonder, it does look only at modification time and doesn't build a list. Benchmarking the call to ls in backticks doesn't make sense, because perl is blocked during the ls run.

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      -M is 40% faster than (stat)[9], no wonder, it does look only at modification time and doesn't build a list.
      It doesn't build a list but it stores all the values from stat. From the -X man page:
      If any of the file tests (or either the "stat" or "lstat" operators) are given the special filehandle consisting of a solitary underline, then the stat structure of the previous file test (or stat operator) is used, saving a system call.
      So when you use -M _ or -f _ you are retrieving the stored values from the previous stat or file test.

        Right.

        If I say (stat)[9] I am building a list containing what stat returns and reference the element with index 9 in that list. If I use -M I don't create and operate on a list (although there's an underlying structure, of course).

        Getting the right element from the underlying structure directly is faster than pulling them all out and throwing away all elements but one (even though -M involves time delta calculation).

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://559687]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2024-04-16 08:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found