Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Search array of file names in directory structure

by Kenosis (Priest)
on Oct 01, 2012 at 15:52 UTC ( #996682=note: print w/ replies, xml ) Need Help??


in reply to Search array of file names in directory structure

Try using the following to create an alternation regex for your name rule:

my $regex = join '|', map "\Q$_\E?", @filelist; my $find = File::Find::Rule ->file ->name(qr/^(?:$regex)$/) ->start( $dir ); while ( defined( my $html_document = $find->match ) ) { push @found_html, $html_document; } $f_count = scalar @found_html;

The generated regex:

1234567_3a_20101000\.html?|99877_b_20111111\.html?|99877_c_20111111\.h +tml?|99877_d_20111111\.html?|99877_e_20111111\.html?|99877_uf_2011111 +1\.html?|1234567_g_20101000\.html?|99877_h_20111111\.html?|99877_i_20 +111111\.html?|99877_j_20111111\.html?|99877_k_20111111\.html?|99877_l +l_20111111\.html?|1234567_pl_20101000\.html?|99877_qa_20111111\.html? +|99877_rr_20111111\.html?|99877_sx_20111111\.html?|99877_xy_20111111\ +.html?|99877_nm_20111111\.html?

Since you've established a regex name rule, you don't need to check the results of $find->match, as it's using that rule.

Edit: Have updated the regex, based upon kennethk's comments.


Comment on Re: Search array of file names in directory structure
Select or Download Code
Re^2: Search array of file names in directory structure
by kennethk (Monsignor) on Oct 01, 2012 at 17:34 UTC
    If you are going for an autogenned regex off a fixed file list, why bother with the grouping? By the time you've gotten to the file extension, you've already passed the more rigorous constraint. Plus, yours can fail positive due to substring match and your filter doesn't allow for checking .htm files, as in the original regex. Simpler and more recyclable:
    my $regex = do { my @escaped; push @escaped, quotemeta for @filelist; my $joined = join '|', @escaped; qr/^(?:$joined)$/; };
    or, if you like things done in 1 shot,
    my $regex = '^(?:' . join('|', map quotemeta, @filelist) . ')$';
    or, for the overly clever, my $regex = qr/^(?:@{[join '|', map quotemeta, @filelist]})$/;

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      Great comments! Actually, I completely missed the l? in the original regex. Unless I'm mistaken, your regex filter also doesn't allow for checking .htm files.

      Have updated my posting. Appreciate you bringing these issues to my attention.

        It doesn't, but only insofar is it only checks the literals of the passed list. At some point, of course, all this becomes academic, since we don't actually know the entire use case; TIMTOWTDI.


        #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re^2: Search array of file names in directory structure
by Anonymous Monk on Oct 02, 2012 at 15:41 UTC
    Hi, can you explain the Reg. Expression from this line?
    my $regex = join '|', map "\Q$_\E?", @filelist;

    If you need to look for a particular pattern how would you alter this code; "\Q$_\E?"

      Here's the line, explained a bit:

      my $regex = join '|', map "\Q$_\E?", @filelist; ^ ^ ^ ^^ | | | || | | | |+ - Last character in file name optio +nal, i.e., the "l" in html | | | + - End quote metacharacters | | + - The default scalar aliased to each @ +filelist element | + - Begin quote metacharacters (e.g., the +period) + - Join each element with alternation ("or") charac +ter

      As shown, this results in the following:

      1234567_3a_20101000\.html?|99877_b_20111111\.html?|99877_c_20111111\.h +tml? ...

      This line builds a regex using the file names in @filelist. When passed to File::Find::Rule as the -name rule, only those files names which match the regex will be returned by File::Find::Rule, if any.

      If you want File::Find::Rule to look for a particular pattern, change the regex here:

      ->name(qr/^(?:$regex)$/) ^^^^^^^^^^^^

      However, if you want to process the files returned by File::Find::Rule, you can do something like this:

      my @matchingFileNames = grep /pattern/, @found_html;

      where "pattern" represents the regex that would 'filter' the elements of @found_html.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://996682]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2014-12-26 07:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (168 votes), past polls