Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

File::Basename is dog slow

by petdance (Parson)
on Dec 19, 2006 at 19:12 UTC ( [id://590748]=perlmeditation: print w/replies, xml ) Need Help??

I was doing some profiling on ack the other night. What I found was amazing. For the simple call:
ack foo ~/big-tree-of-source
about 48% of the time was in repeated calls to File::Basename::parsefile() to figure out the extension of the file being checked. I replaced it with a simple regex and the run-time cut in half.

The slowness is undoubtedly aggravated by how the function needs to know a list of all the suffixes that it should look for. Removing the call to File::Basename::parsefile(), I also got to throw away the code that built the suffix list.

xoxo,
Andy

Replies are listed 'Best First'.
Re: File::Basename is dog slow
by runrig (Abbot) on Dec 19, 2006 at 19:30 UTC
    I assume you're talking about this code (from parsefile):
    if (@suffices) { $tail = ''; foreach $suffix (@suffices) { my $pat = ($igncase ? '(?i)' : '') . "($suffix)\$"; if ($basename =~ s/$pat//s) { $taint .= substr($suffix,0,0); $tail = $1 . $tail; } } }
    It recompiles a regex for every suffix on every call to parsefile. Yuck! This module could use another function that saves a regex to do the suffix checking (in a closure or an object). Or something :-)

      How about just defaulting to stripping s{[.]([^./\\]*)$}{} (slightly more portably) instead of having to pass in a huge (and likely incomplete) list of possible extensions. That interface never made much sense to me, and hence I don't use it (and I'm not surprised that it is slow). I'd say, if it hurts, stop doing it (use something other than File::BaseName for stripping extensions, such as that regex I show above, which is portable enough for most work, likely). (:

      - tye        

      Might be. I didn't look into it. I just wanted to avoid it.

      xoxo,
      Andy

        You could keep File::Basename and just stop passing in a list of extensions. Just replace that part of the functionality. Stripping extensions is the part of Basename's parsing that is easy to replace portably, since it is just stripping the list of things you told it to. Keeping the added portability of the rest of the parsing is probably worthwhile and probably isn't dog slow. (:

        - tye        

Re: File::Basename is dog slow
by rinceWind (Monsignor) on Dec 20, 2006 at 11:12 UTC

    I don't know how well it benchmarks, but you might like to try File::Wildcard. ack is the kind of application I designed File::Wildcard for.

    --

    Oh Lord, won’t you burn me a Knoppix CD ?
    My friends all rate Windows, I must disagree.
    Your powers of persuasion will set them all free,
    So oh Lord, won’t you burn me a Knoppix CD ?
    (Missquoting Janis Joplin)

      Thanks for the heads-up. I'm trying to keep ack's only non-core module to File::Next.

      xoxo,
      Andy

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://590748]
Approved by marto
Front-paged by kwaping
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2024-03-19 08:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found