Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Glob and lstat

by JediWizard (Deacon)
on Jul 27, 2012 at 19:39 UTC ( #984079=perlquestion: print w/replies, xml ) Need Help??

JediWizard has asked for the wisdom of the Perl Monks concerning the following question:

I've been looking through a fairly large code base, updating things and looking for possible preformance improvements. I came across a few bits of code that use:

my(@list) = glob('/path/d*/d*');

To get lists of files to process. My tendancy in the past has been to either use File::Find for more complicated searches, or just opendir for more basic situations. Being a good scientist, I decided to put together some benchmarks to determine which would be faster. My results consistently showed the opendir method running substantially faster every time (the sub routines I'm testing with are included below). I'm glad to have done the tests, but surprised by the results, so I decided to poke my head in a little deeper to figure out why. So I ran my tests a few times with strace running to get a look at what was actually going on. What I discovered was that using glob, perl was doing an lstat on every single item it was returning from the glob, which the opendir method clearly did not.

I've looked here, and here, but nothing there has explained why glob is running lstat on every item it returns. Even looking through the various flags available when using the "bsd_glob" method available for export from File::Glob do not appear to make use of the data that lstat would be providing... so why is perl wasting so many compute cycles getting that information?

sub get_by_glob { my @dids = map {/d([^\/]+)$/; $1} csh_glob("$path/d[0-9]*/d[0-9]*" +); return \@dids; } sub get_by_open { opendir(my $dh, $path); my(@top) = grep(/^d/, readdir($dh)); closedir($dh); my(@dids)=(); foreach my $sd (@top){ opendir(my $sh, $path.'/'.$sd.'/'); push @dids, map({m/d(\d+)$/} readdir($sh)); closedir($sh); } return \@dids; }

Any insight would be appreciated.


They say that time changes things, but you actually have to change them yourself.

—Andy Warhol

Replies are listed 'Best First'.
Re: Glob and lstat
by MidLifeXis (Monsignor) on Jul 27, 2012 at 20:04 UTC

    I see some potential stat calls here and possibly in this area.

    Not being particularly fresh on my C/XS skills, and having other things that need doing, I stopped looking there.

    --MidLifeXis

      Interesting. Forgive me, my C is a little rusty these days... it looks to me like the lstat call is there to facilitate the "GLOB_MARK" functionality, which causes glob to append a / to the end of any returned items which are directories. Unfortunately, it also appears that the lstat operation is performed even if the flag in question was not in fact specified. Oh well, it is good to learn now things.


      They say that time changes things, but you actually have to change them yourself.

      —Andy Warhol

Re: Glob and lstat
by Anonymous Monk on Jul 28, 2012 at 02:58 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://984079]
Approved by MidLifeXis
Front-paged by MidLifeXis
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (2)
As of 2021-06-19 08:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)












    Results (91 votes). Check out past polls.

    Notices?