Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Re: file name parsing to get the original file name

by Abigail-II (Bishop)
on Aug 19, 2003 at 14:11 UTC ( #284899=note: print w/replies, xml ) Need Help??

in reply to Re: file name parsing to get the original file name
in thread file name parsing to get the original file name

That's not very flexible. I can understand not needing to care about separators from different platforms, but your approach only works if you have two directories and then the file. It will fail if there's just one directory, or three.

my $filename = "one/two/three/four/five/six.a"; my $name = (split m{/} => $filename) [-1];


Replies are listed 'Best First'.
Re: Re: file name parsing to get the original file name
by MarkM (Curate) on Aug 20, 2003 at 05:40 UTC

    If we are trying for 'best UNIX-only solution that requires no modules', I vote for:

    my($name) = $path =~ /([^\/]+)\z/;

    I second Abigail-II's suggestion that a module is used, though, as these sorts of problems are generic in nature, and it is very scary to see hundreds of different solutions to the same problem, each with their own independent set of failings.

    At least if a single module is used by everybody, then the code is being excercised in a higher percentage of the possible contexts, and problems will be fixed sooner, rather than being discovered much later.

    UPDATE: Optimizing the above expression, we can see the speed improve by a factor of 6:

    $path =~ /(?:.*\/)?(.+)/s; my $name = $1;

    It seems that the Perl regular expression engine does a poor job of dealing with matching a pattern at the end of a string. This is not surprising given that most regular expression engines start searching from the beginning of the string.

      Some quick benchmarking shows your solution to be about half as fast compared to mine.
      #!/usr/bin/perl use strict; use warnings; use Benchmark qw /cmpthese/; our @files = qw { /etc/passwd one/two/three/four/five/six.a file a/very/deep/file/indeed/deeper/than/you/may/think/really }; cmpthese -5 => { abigail => 'foreach my $f (@files) { my $fn = (split m{/} => $f) [-1] }', markm => 'foreach my $f (@files) { my ($fn) = $f =~ /([^\/]+)\z/ }', }; __END__ Benchmark: running abigail, markm for at least 5 CPU seconds... abigail: 5 wallclock secs ( 5.19 usr + 0.00 sys = 5.19 CPU) @ 68 +404.24/s (n=355018) markm: 6 wallclock secs ( 5.23 usr + 0.00 sys = 5.23 CPU) @ 34 +427.53/s (n=180056) Rate markm abigail markm 34428/s -- -50% abigail 68404/s 99% --


        Interesting. It looks like you've found yet another piece of Perl that isn't implemented in the most optimal manner. :-)

        Playing around, I found that on my system, the following tweak allows the 'single regexp match' to beat the 'split into a temporary list, and grab the last entry' approach by ~15%:

        $f =~ /(?:.*\/)?(.+)/s; my $fn = $1;

        I'm still surprised that Perl can match against '/' several times (split) faster than it can skip to the last '/' with a single rather simple match. It seems the Perl regular expression engine could still use a few optimizations. Until then, I suppose explicit optimization isn't that bad.


      Enter sexeger.

      Add this to Abigail's benchmark.

      aristotle => 'foreach my $f (@files) { my ($fn) = reverse($f) =~ m!^(.*?)/?!s; $fn = reverse $fn; }',
                    Rate     markm   abigail    markm2 aristotle
      markm      39625/s        --      -56%      -58%      -61%
      abigail    89688/s      126%        --       -4%      -11%
      markm2     93877/s      137%        5%        --       -7%
      aristotle 100885/s      155%       12%        7%        --
      Reversing the string (twice!) may be costly, but the simplicity of the regex offsets this. Note that [^/]+ would have been much slower. .*? has been treated to special optimizations.

      Makeshifts last the longest.

      Wouldn't it be easier just to do:


      Basically nuke everything in the way and grab what's left??

        The check:
        Doesn't take into account that the path might *not* have a directory component...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://284899]
[choroba]: Email::Address DoS
[Corion]: (that module has been deprecated by its author already, so that's fair. Although I wonder why the backtracking can't be fixed to handle the formfeeds gracefully)
[choroba]: not enough tuits?
[Corion]: choroba: Yeah, maybe. I'm also unaware of who uses Email:: modules, but that's more my limited horizon of things ;)
[Corion]: Ah - there even is the replacement of Email::Address::XS , by the bug reporter, which hopefully fixes this bug already ;)

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (9)
As of 2018-06-20 11:56 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (116 votes). Check out past polls.