Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Search array of file names in directory structure

by Anonymous Monk
on Oct 01, 2012 at 14:59 UTC ( #996672=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!
I having an issue where I am trying to search in this directory structure for all html files that will match against a list of file names in an array(@filelist). I am finding all the HTMLs fine using "File::Find::Rule" searching recursively, but my issue is finding the best way to get only the file names that match in the @filelist array into the @found_html array in my code. Literally using the names in the @filelist array to search for them in the directory($dir) and place the found file names in the @found_html.

Thanks for looking!!!
#!/usr/bin/perl -w use strict; use File::Find::Rule; use File::Basename; my @filelist = qw(1234567_3a_20101000.html 99877_b_20111111.html 9987 +7_c_20111111.html 99877_d_20111111.html 99877_e_20111111.html 99877_u +f_20111111.html 1234567_g_20101000.html 99877_h_20111111.html 99877 +_i_20111111.html 99877_j_20111111.html 99877_k_20111111.html 99877_ll +_20111111.html 1234567_pl_20101000.html 99877_qa_20111111.html 998 +77_rr_20111111.html 99877_sx_20111111.html 99877_xy_20111111.html 998 +77_nm_20111111.html); my $dir = '/var/www/files/'; my $find = File::Find::Rule ->file ->name(qr/\.html?$/) ->start( $dir ); my $f_count = 0; my @found_html; while ( defined ( my $html_document = $find->match ) ) { $f_count++; my $filenames = basename( $html_document ); foreach my $chk_file(@filelist) { if($chk_file=~/$filenames/g) { push @found_html, $filenames; } } }

Comment on Search array of file names in directory structure
Download Code
Re: Search array of file names in directory structure
by kennethk (Monsignor) on Oct 01, 2012 at 15:23 UTC
    If I'm following your spec, the easiest way to accomplish your goal is use a hash:
    #!/usr/bin/perl -w use strict; use File::Find::Rule; use File::Basename; my @filelist = qw(1234567_3a_20101000.html 99877_b_20111111.html 9987 +7_c_20111111.html 99877_d_20111111.html 99877_e_20111111.html 99877_u +f_20111111.html 1234567_g_20101000.html 99877_h_20111111.html 99877 +_i_20111111.html 99877_j_20111111.html 99877_k_20111111.html 99877_ll +_20111111.html 1234567_pl_20101000.html 99877_qa_20111111.html 998 +77_rr_20111111.html 99877_sx_20111111.html 99877_xy_20111111.html 998 +77_nm_20111111.html); my %want; $want{$_}++ for @filelist; my $dir = '/var/www/files/'; my $find = File::Find::Rule ->file ->name(qr/\.html?$/) ->start( $dir ); my $f_count = 0; my @found_html; while ( defined ( my $html_document = $find->match ) ) { $f_count++; my $filenames = basename( $html_document ); push @found_html, $filenames if $want{$filenames}; }

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: Search array of file names in directory structure
by Kenosis (Priest) on Oct 01, 2012 at 15:52 UTC

    Try using the following to create an alternation regex for your name rule:

    my $regex = join '|', map "\Q$_\E?", @filelist; my $find = File::Find::Rule ->file ->name(qr/^(?:$regex)$/) ->start( $dir ); while ( defined( my $html_document = $find->match ) ) { push @found_html, $html_document; } $f_count = scalar @found_html;

    The generated regex:

    1234567_3a_20101000\.html?|99877_b_20111111\.html?|99877_c_20111111\.h +tml?|99877_d_20111111\.html?|99877_e_20111111\.html?|99877_uf_2011111 +1\.html?|1234567_g_20101000\.html?|99877_h_20111111\.html?|99877_i_20 +111111\.html?|99877_j_20111111\.html?|99877_k_20111111\.html?|99877_l +l_20111111\.html?|1234567_pl_20101000\.html?|99877_qa_20111111\.html? +|99877_rr_20111111\.html?|99877_sx_20111111\.html?|99877_xy_20111111\ +.html?|99877_nm_20111111\.html?

    Since you've established a regex name rule, you don't need to check the results of $find->match, as it's using that rule.

    Edit: Have updated the regex, based upon kennethk's comments.

      If you are going for an autogenned regex off a fixed file list, why bother with the grouping? By the time you've gotten to the file extension, you've already passed the more rigorous constraint. Plus, yours can fail positive due to substring match and your filter doesn't allow for checking .htm files, as in the original regex. Simpler and more recyclable:
      my $regex = do { my @escaped; push @escaped, quotemeta for @filelist; my $joined = join '|', @escaped; qr/^(?:$joined)$/; };
      or, if you like things done in 1 shot,
      my $regex = '^(?:' . join('|', map quotemeta, @filelist) . ')$';
      or, for the overly clever, my $regex = qr/^(?:@{[join '|', map quotemeta, @filelist]})$/;

      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

        Great comments! Actually, I completely missed the l? in the original regex. Unless I'm mistaken, your regex filter also doesn't allow for checking .htm files.

        Have updated my posting. Appreciate you bringing these issues to my attention.

      Hi, can you explain the Reg. Expression from this line?
      my $regex = join '|', map "\Q$_\E?", @filelist;

      If you need to look for a particular pattern how would you alter this code; "\Q$_\E?"

        Here's the line, explained a bit:

        my $regex = join '|', map "\Q$_\E?", @filelist; ^ ^ ^ ^^ | | | || | | | |+ - Last character in file name optio +nal, i.e., the "l" in html | | | + - End quote metacharacters | | + - The default scalar aliased to each @ +filelist element | + - Begin quote metacharacters (e.g., the +period) + - Join each element with alternation ("or") charac +ter

        As shown, this results in the following:

        1234567_3a_20101000\.html?|99877_b_20111111\.html?|99877_c_20111111\.h +tml? ...

        This line builds a regex using the file names in @filelist. When passed to File::Find::Rule as the -name rule, only those files names which match the regex will be returned by File::Find::Rule, if any.

        If you want File::Find::Rule to look for a particular pattern, change the regex here:

        ->name(qr/^(?:$regex)$/) ^^^^^^^^^^^^

        However, if you want to process the files returned by File::Find::Rule, you can do something like this:

        my @matchingFileNames = grep /pattern/, @found_html;

        where "pattern" represents the regex that would 'filter' the elements of @found_html.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://996672]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (8)
As of 2014-12-21 12:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (104 votes), past polls