http://www.perlmonks.org?node_id=285107


in reply to Re: file name parsing to get the original file name
in thread file name parsing to get the original file name

If we are trying for 'best UNIX-only solution that requires no modules', I vote for:

my($name) = $path =~ /([^\/]+)\z/;

I second Abigail-II's suggestion that a module is used, though, as these sorts of problems are generic in nature, and it is very scary to see hundreds of different solutions to the same problem, each with their own independent set of failings.

At least if a single module is used by everybody, then the code is being excercised in a higher percentage of the possible contexts, and problems will be fixed sooner, rather than being discovered much later.

UPDATE: Optimizing the above expression, we can see the speed improve by a factor of 6:

$path =~ /(?:.*\/)?(.+)/s; my $name = $1;

It seems that the Perl regular expression engine does a poor job of dealing with matching a pattern at the end of a string. This is not surprising given that most regular expression engines start searching from the beginning of the string.

Replies are listed 'Best First'.
Re: file name parsing to get the original file name
by Abigail-II (Bishop) on Aug 20, 2003 at 08:00 UTC
    Some quick benchmarking shows your solution to be about half as fast compared to mine.
    #!/usr/bin/perl use strict; use warnings; use Benchmark qw /cmpthese/; our @files = qw { /etc/passwd one/two/three/four/five/six.a file a/very/deep/file/indeed/deeper/than/you/may/think/really }; cmpthese -5 => { abigail => 'foreach my $f (@files) { my $fn = (split m{/} => $f) [-1] }', markm => 'foreach my $f (@files) { my ($fn) = $f =~ /([^\/]+)\z/ }', }; __END__ Benchmark: running abigail, markm for at least 5 CPU seconds... abigail: 5 wallclock secs ( 5.19 usr + 0.00 sys = 5.19 CPU) @ 68 +404.24/s (n=355018) markm: 6 wallclock secs ( 5.23 usr + 0.00 sys = 5.23 CPU) @ 34 +427.53/s (n=180056) Rate markm abigail markm 34428/s -- -50% abigail 68404/s 99% --

    Abigail

      Interesting. It looks like you've found yet another piece of Perl that isn't implemented in the most optimal manner. :-)

      Playing around, I found that on my system, the following tweak allows the 'single regexp match' to beat the 'split into a temporary list, and grab the last entry' approach by ~15%:

      $f =~ /(?:.*\/)?(.+)/s; my $fn = $1;

      I'm still surprised that Perl can match against '/' several times (split) faster than it can skip to the last '/' with a single rather simple match. It seems the Perl regular expression engine could still use a few optimizations. Until then, I suppose explicit optimization isn't that bad.

      Cheers,
      mark

        I'm still surprised that Perl can match against '/' several times (split) faster than it can skip to the last '/' with a single rather simple match.

        I'm not surprised. /(^\/+)\z/ is a rather complicated regex due to the character class being used. The optimizer can't figure out that is the same as "looking for the last slash". m!/! on the other hand is so simple, the optimizer recognizes it as searching for a fixed string.

        Abigail

Re^3: file name parsing to get the original file name (regex performance)
by Aristotle (Chancellor) on Aug 24, 2003 at 05:43 UTC

    Enter sexeger.

    Add this to Abigail's benchmark.

    aristotle => 'foreach my $f (@files) { my ($fn) = reverse($f) =~ m!^(.*?)/?!s; $fn = reverse $fn; }',
    
                  Rate     markm   abigail    markm2 aristotle
    markm      39625/s        --      -56%      -58%      -61%
    abigail    89688/s      126%        --       -4%      -11%
    markm2     93877/s      137%        5%        --       -7%
    aristotle 100885/s      155%       12%        7%        --
    
    Reversing the string (twice!) may be costly, but the simplicity of the regex offsets this. Note that [^/]+ would have been much slower. .*? has been treated to special optimizations.

    Makeshifts last the longest.

Re^3: file name parsing to get the original file name
by rthawkcom (Novice) on Jun 22, 2011 at 15:54 UTC
    Wouldn't it be easier just to do:

    $path=~/.*\/(.*)$/;$name=$1;

    Basically nuke everything in the way and grab what's left??

      The check:
      $path=~/.*\/(.*)$/;$name=$1; 
      
      Doesn't take into account that the path might *not* have a directory component...