Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

File::Find find several strings in one directory

by Staralfur (Novice)
on May 09, 2017 at 16:53 UTC ( #1189923=perlquestion: print w/replies, xml ) Need Help??

Staralfur has asked for the wisdom of the Perl Monks concerning the following question:

#!/usr/bin/perl use strict; use warnings; use File::Find; my $dir = '/home/me/Desktop/tasks/Korpora/'; my $string = 'Alice '; open my $results, '>', '/home/me/Desktop/results.txt' or die "Unable to open results file: $!"; find(\&printFile, $dir); sub printFile { return unless -f and /\.html$/; open my $fh, '<',, $_ or do { warn qq(Unable to open "$File::Find::name" for reading: $!); return; }; while (<$fh>) { if (/\Q$string/) { print $results "$File::Find::name\n"; return; } } }

Dear Monks, I have a big directory with html articles, where I would like to search for several strings at the same time. For example for: Alice, Tom, Jamie, party, something. How can I do it with the module file find? I have already tried to search for $string1 and $string2 and also $string1&&string2, but it doesn't work. I am a newbie and haven't worked with file find before, but it seems to be the easiest way. Can you help me? Thanks in advance!

Replies are listed 'Best First'.
Re: File::Find find several strings in one directory
by stevieb (Canon) on May 09, 2017 at 17:24 UTC

    You can use grouping with the or operator within the regex, combined with word boundaries:

    use warnings; use strict; use feature 'say'; my @strings = ( "Bob said hello\n", "Alice doesn't like Chris\n", "This line won't match\n" ); for (@strings){ say "yay!" if /(?:\bAlice\b|\bBob\b|\bChris\b)/; }

    That says:

    / (?: # group, but don't capture \bAlice\b # capture Alice, if it is standalone | # or \bBob\b # same as Alice above | # or \bChris\b # same as Alice and Bob ) # end grouping /

      Thank you, I will try it!

Re: File::Find find several strings in one directory
by FreeBeerReekingMonk (Deacon) on May 09, 2017 at 19:02 UTC
    You seem to have a Unix environment, why not try ack? (It is written in Perl :) )

    ack is simple to install, via CPAN, package or simple download. Read How

    For work (Windows) I use docfetcher (although it is ancient).

    For that onetime search: find . -name '*.htm*' -exec grep -i -e foo -e bar {} /dev/null \; (search for foo or bar inside .html and .htm files...)

      I must do it alone, it is a task for university, I wish I could've used it... I don't at all know how my script should look like in order for me to be able to extract the matches from the current directory into a new one..

Re: File::Find find several strings in one directory
by thanos1983 (Parson) on May 10, 2017 at 11:26 UTC

    Hello Staralfur,

    Another alternative is to use find with multiple parameters and traverse through your directories. But notice that on some systems (such as Cygwin), parentheses are necessary to make the set of extensions inclusive: my @files = `find $path \( -name '*.c' -o -name '*.txt'\)`;.

    Sample of script (WRONG see bellow Update2 and Update3):

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $path = shift || '.'; print Dumper traverse($path); sub traverse { my ($path) = @_; my @files = `find $path -name '*.c' -o -name '*.txt'`; return if not -d $path; opendir my $dh, $path or die; while (my $sub = readdir $dh) { next if $sub eq '.' or $sub eq '..'; traverse("$path/$sub"); } close $dh; chomp @files; return \@files; } __DATA__ $ perl file.pl $VAR1 = [ './counts.txt', './file.txt', './sample.c', './testDir/anotherSample.c', './test.txt' ];

    But to be 100% honest, I think I would have gone also with stevieb, more generic on OS system. But maybe you can come up with something different.

    Update: Combining stevieb solution and File::Find:

    #!/usr/bin/perl use strict; use warnings; use File::Find; use Data::Dumper; my @dirs = @ARGV ? @ARGV : ('.'); my @list; find( sub{ push @list, $File::Find::name if -f $_ && $_ =~ /(?:\btest\b|\bsample\b|\bChris\b)/ }, @dirs ); print Dumper \@list; __DATA__ $ perl file.pl $VAR1 = [ './test.pl~', './sample.c', './test.txt~', './test.pl', './test.txt' ];

    Update2: Combining stevieb solution and recursive search on directories:

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my @files; my $path = shift || '.'; print Dumper traverse($path); sub traverse { my ($path) = @_; return if not -d $path; opendir my $dh, $path or die; while (my $sub = readdir $dh) { next if $sub eq '.' or $sub eq '..'; push @files, "$path/$sub" if ("$path/$sub" =~ /(?:\btest\b|\banotherSample\b|\bsample\b) +/); traverse("$path/$sub"); } close $dh; return \@files; } __DATA__ $ perl file.pl $VAR1 = [ './test.pl~', './sample.c', './test.txt~', './testDir/anotherSample.c', './test.pl', './test.txt' ];

    Update3: :Another alternative is to use find with multiple parameters and traverse through your directories. But notice that on some systems (such as Cygwin), parentheses are necessary to make the set of extensions inclusive: my @files = `find $path \( -name '*.c' -o -name '*.txt'\)`;.

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $path = shift || '.'; my @files = `find $path -name '*.c' -o -name '*.txt'`; chomp @files; print Dumper \@files; __DATA__ $ perl file.pl $VAR1 = [ './test.pl~', './sample.c', './test.txt~', './testDir/anotherSample.c', './test.pl', './test.txt' ];

    Hope this helps.

    Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: File::Find find several strings in one directory
by karlgoethebier (Abbot) on May 10, 2017 at 17:58 UTC
    "...it seems to be the easiest way..."

    Who knows. Consider File::Find::Rule:

    # untested use File::Find::Rule; my @files = File::Find::Rule->file() ->name('*.html') ->grep($pattern) ->in($dir);

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

    Furthermore I consider that Donald Trump must be impeached as soon as possible

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1189923]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2022-08-19 08:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?