Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Multiple regex against files with speed

by crabbdean (Pilgrim)
on Apr 28, 2004 at 13:06 UTC ( #348807=perlquestion: print w/ replies, xml ) Need Help??
crabbdean has asked for the wisdom of the Perl Monks concerning the following question:

Hi Fellow Monks and Nuns,

I'm rewriting a perl module to do file listing because quite frankly I don't like File::Find. I've got it listing beautifully, its fast and in OO. I want to now add features. One is to be able to specify types of files or directories to match against.

In essence you have to be able to specify many types of files (eg. ~*.doc, ~*.tmp etc). It then has to check each file against ALL these types. Simply I want to know, what is the fastest way of doing this?

So far I'm considered:
1. Storing the regex string into the values of hashes and then doing a regex against each of the search hash values.
$found = 0; $hash{1} = '*.doc'; $hash{2} = '*.xls'; $hash{3} = '*.ppt'; map { $found = 1 if $file =~ /$_/; } values(%hash);

2. OR stringing all the searchstrings together into one big regex
$found = 0; $hash{1} = '*.doc'; $hash{2} = '*.xls'; $hash{3} = '*.ppt'; for (values(%hash)) { $string .= $_ . "|"; } $found = 1 if $file =~ /$string/;
Any other ideas? Speed is the key here. Thanks.

Oh, and if you have any other requests for what you'd like in this module or frequent types of things you do against files in your scripts, I'll see what I can do to add them in.

Dean
The Funkster of Mirth
Programming these days takes more than a lone avenger with a compiler. - sam
RFC1149: A Standard for the Transmission of IP Datagrams on Avian Carriers

Comment on Multiple regex against files with speed
Select or Download Code
Re: Multiple regex against files with speed
by jeffa (Chancellor) on Apr 28, 2004 at 13:11 UTC
      The module I've build already breaks the full file name into 4 pieces - a relative path, base directory, fullfile name and filename. So I've got that bit nicely. The problem with the "File::Find::Rule" is that it utilises the "File::Find" which I'm trying to avoid. Unfortunately considering "File::Find" is a core Perl module it means re-inventing the wheel. Ho hum! Although I do like the interface used on some of the "File::Find::Rule" methods. I have a read through his code.

      Dean
      The Funkster of Mirth
      Programming these days takes more than a lone avenger with a compiler. - sam
      RFC1149: A Standard for the Transmission of IP Datagrams on Avian Carriers
Re: Multiple regex against files with speed
by Fletch (Chancellor) on Apr 28, 2004 at 13:19 UTC

    perldoc -f grep, and read about qr// in perldoc perlop.

Re: Multiple regex against files with speed
by pelagic (Curate) on Apr 28, 2004 at 14:13 UTC
    Benchmark your solutions. There are plenty of Benchmark exaples around here.

    pelagic
Re: Multiple regex against files with speed
by eserte (Deacon) on Apr 28, 2004 at 14:15 UTC
    If you're for speed, then take a look at Regexp::Optimizer. This will append regexes into one regexp and additionly will do trie optimization.

    Assuming you really mean regexps, your example code uses file globs instead.

Re: Multiple regex against files with speed
by Roy Johnson (Monsignor) on Apr 28, 2004 at 14:34 UTC
    I'm curious about why you're using a hash as an array instead of using an array.

    Note that you'll need to convert your glob patterns to actual regexes if you want to do regex matching.

    Stringing the regexes together is more naturally done with join than with a foreach loop and concatenation.


    The PerlMonk tr/// Advocate

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://348807]
Approved by Thelonius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2014-10-02 06:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (49 votes), past polls