Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: file name parsing to get the original file name

by Abigail-II (Bishop)
on Aug 19, 2003 at 14:11 UTC ( #284899=note: print w/ replies, xml ) Need Help??


in reply to Re: file name parsing to get the original file name
in thread file name parsing to get the original file name

That's not very flexible. I can understand not needing to care about separators from different platforms, but your approach only works if you have two directories and then the file. It will fail if there's just one directory, or three.

my $filename = "one/two/three/four/five/six.a"; my $name = (split m{/} => $filename) [-1];

Abigail


Comment on Re: file name parsing to get the original file name
Download Code
Re: Re: file name parsing to get the original file name
by MarkM (Curate) on Aug 20, 2003 at 05:40 UTC

    If we are trying for 'best UNIX-only solution that requires no modules', I vote for:

    my($name) = $path =~ /([^\/]+)\z/;

    I second Abigail-II's suggestion that a module is used, though, as these sorts of problems are generic in nature, and it is very scary to see hundreds of different solutions to the same problem, each with their own independent set of failings.

    At least if a single module is used by everybody, then the code is being excercised in a higher percentage of the possible contexts, and problems will be fixed sooner, rather than being discovered much later.

    UPDATE: Optimizing the above expression, we can see the speed improve by a factor of 6:

    $path =~ /(?:.*\/)?(.+)/s; my $name = $1;

    It seems that the Perl regular expression engine does a poor job of dealing with matching a pattern at the end of a string. This is not surprising given that most regular expression engines start searching from the beginning of the string.

      Some quick benchmarking shows your solution to be about half as fast compared to mine.
      #!/usr/bin/perl use strict; use warnings; use Benchmark qw /cmpthese/; our @files = qw { /etc/passwd one/two/three/four/five/six.a file a/very/deep/file/indeed/deeper/than/you/may/think/really }; cmpthese -5 => { abigail => 'foreach my $f (@files) { my $fn = (split m{/} => $f) [-1] }', markm => 'foreach my $f (@files) { my ($fn) = $f =~ /([^\/]+)\z/ }', }; __END__ Benchmark: running abigail, markm for at least 5 CPU seconds... abigail: 5 wallclock secs ( 5.19 usr + 0.00 sys = 5.19 CPU) @ 68 +404.24/s (n=355018) markm: 6 wallclock secs ( 5.23 usr + 0.00 sys = 5.23 CPU) @ 34 +427.53/s (n=180056) Rate markm abigail markm 34428/s -- -50% abigail 68404/s 99% --

      Abigail

        Interesting. It looks like you've found yet another piece of Perl that isn't implemented in the most optimal manner. :-)

        Playing around, I found that on my system, the following tweak allows the 'single regexp match' to beat the 'split into a temporary list, and grab the last entry' approach by ~15%:

        $f =~ /(?:.*\/)?(.+)/s; my $fn = $1;

        I'm still surprised that Perl can match against '/' several times (split) faster than it can skip to the last '/' with a single rather simple match. It seems the Perl regular expression engine could still use a few optimizations. Until then, I suppose explicit optimization isn't that bad.

        Cheers,
        mark

      Enter sexeger.

      Add this to Abigail's benchmark.

      aristotle => 'foreach my $f (@files) { my ($fn) = reverse($f) =~ m!^(.*?)/?!s; $fn = reverse $fn; }',
      
                    Rate     markm   abigail    markm2 aristotle
      markm      39625/s        --      -56%      -58%      -61%
      abigail    89688/s      126%        --       -4%      -11%
      markm2     93877/s      137%        5%        --       -7%
      aristotle 100885/s      155%       12%        7%        --
      
      Reversing the string (twice!) may be costly, but the simplicity of the regex offsets this. Note that [^/]+ would have been much slower. .*? has been treated to special optimizations.

      Makeshifts last the longest.

      Wouldn't it be easier just to do:

      $path=~/.*\/(.*)$/;$name=$1;

      Basically nuke everything in the way and grab what's left??

        The check:
        $path=~/.*\/(.*)$/;$name=$1; 
        
        Doesn't take into account that the path might *not* have a directory component...
[DELETE - DUPLICATE] Re: Re: file name parsing to get the original file name
by NodeReaper (Curate) on Aug 20, 2003 at 05:41 UTC

    Reason: Zaxo delete, dupe

    For more information on this node visit: this

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://284899]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2014-09-21 00:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (165 votes), past polls