Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Passing variable to if statement

by PeachCityDude (Novice)
on Jul 16, 2014 at 20:07 UTC ( #1093932=perlquestion: print w/replies, xml ) Need Help??
PeachCityDude has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to search a large number of files (file1, file2, etc.). I want to search each file for a different name (Joe, Lee, etc.). I want to see if "Joe" appears in "file1" and "Lee" appears in "file2". I begin with a text file (array.txt) that contains the files and names and put this into an array.

array.txt: file1 Joe file2 Lee ... ...

Then, I use "foreach" to work through the items in the array. In the following code, the files "file1" and "file2" will open with each iteration of the "foreach" loop. However, the $name variable in the "if" statement doesn't seem to loop through the names (Joe, Lee) but rather stays on the first name (Joe). Therefore, both "file1" and "file2" are searched for "Joe". I only want to search "file1" for "Joe" then search "file2" for "Lee". Any help is much appreciated!

#!/usr/bin/perl -w use strict; use warnings; use diagnostics; my @filelist = (); my $file=''; my $name=''; open(FILE,"array.txt")||die "Can't open array.txt"; @filelist=<FILE>; close(FILE); foreach my $line (@filelist){ my $file=substr($line,0,5); chomp $file; my $name=substr($line,6,3); chomp $name; open (FILE2,"$file"); while ( <FILE2>) { if (/$name/ismo) { print ("$name found in $file\n"); } } close(FILE2); }

Replies are listed 'Best First'.
Re: Passing variable to if statement
by AppleFritter (Priest) on Jul 16, 2014 at 20:16 UTC

    The problem is with this line:

    if (/$name/ismo) {

    Remove the /o ("compile only once") modifier to your regex, and it'll work just fine.

    Here's what Regexp Quote Like Operators has to say on /o:

    PATTERN may contain variables, which will be interpolated every time the pattern search is evaluated, except for when the delimiter is a single quote. (Note that $( , $) , and $| are not interpolated because they look like end-of-string tests.) Perl will not recompile the pattern unless an interpolated variable that it contains changes. You can force Perl to skip the test and never recompile by adding a /o (which stands for "once") after the trailing delimiter. Once upon a time, Perl would recompile regular expressions unnecessarily, and this modifier was useful to tell it not to do so, in the interests of speed. But now, the only reasons to use /o are either:

    1. The variables are thousands of characters long and you know that they don't change, and you need to wring out the last little bit of speed by having Perl skip testing for that. (There is a maintenance penalty for doing this, as mentioning /o constitutes a promise that you won't change the variables in the pattern. If you do change them, Perl won't even notice.)

    2. you want the pattern to use the initial values of the variables regardless of whether they change or not. (But there are saner ways of accomplishing this than using /o.)

    3. If the pattern contains embedded code, such as

      use re 'eval'; $code = 'foo(?{ $x })'; /$code/

      then perl will recompile each time, even though the pattern string hasn't changed, to ensure that the current value of $x is seen each time. Use /o if you want to avoid this.

    The bottom line is that using /o is almost never a good idea.

    Emphasis mine.

      For that matter, remove the /sm, too. They're pointless here.
Re: Passing variable to if statement
by PerlSufi (Friar) on Jul 16, 2014 at 20:19 UTC
    In addition to what AppleFritter said, use the best practice way of file handling:
    use strict; use warnings; use autodie; my $file = 'array.txt'; open(my $fh, '<', $file);

      Another Perl best practice:  Always decode text upon input and encode text upon output—and always to do it explicitly. Don't leave the handling of the character encoding of the text to whatever your version of perl in your environment (locale) defaults to.

      use strict; use warnings; use autodie qw( open close ); binmode STDOUT, ':encoding(UTF-8)'; # Or ASCII, Latin-1, etc. my $file = 'array.txt'; open my $fh, '<:encoding(UTF-8)', $file; # ... close $fh;


      use strict; use warnings; use autodie qw( open close ); use open qw( :encoding(UTF-8) :std ); # Or ASCII, Latin-1, etc. my $file = 'array.txt'; open my $fh, '<', $file; # ... close $fh;

      I think OP's original code is fine in this regard. I'd say autodie is helpful for those who know all the things that it does, or e.g. for dropping into legacy scripts that don't do enough error checking. But, as I've discovered, it can also cause problems when it's used, and its bug trackers (1, 2) show I'm not the only one. Don't get me wrong, it's a good recommendation, I just disagree that it's a "best practice" - it's just MTOWTDI.

        I think OP's original code is fine in this regard.

        Whoops, I missed the missing or die in the second open. Nevermind that sentence.

      Thank you for the tip on file handling.
Re: Passing variable to if statement
by neilwatson (Priest) on Jul 16, 2014 at 20:19 UTC

    Immediate guess the the 'once' in your regex /$name/ismo. That means compile regex only once because I do not expect it change. But it does :)

    Neil Watson

Re: Passing variable to if statement
by PeachCityDude (Novice) on Jul 16, 2014 at 20:26 UTC
    I removed the "once" modifier and now the code works perfectly. Thank you very much!
      I removed the "once" modifier and now the code works perfectly.

      I don't think the code is working perfectly. If any of the names in array.txt have regular expression metacharacters in them, then the regular expression pattern you're using is unlikely to match the text as you expect it to.

      See perldoc -f quotemeta.

Re: Passing variable to if statement
by Jim (Curate) on Jul 17, 2014 at 00:17 UTC

    Consider this refactored version of your Perl script. (It hasn't been tested at all.)

    use strict; use warnings; use autodie qw( open close ); use open qw( :encoding(UTF-8) :std ); my %terms_in; # Hash of hash of terms in each file my $terms_file = 'array.txt'; open my $fh1, '<', $terms_file; while (my $record = <$fh1>) { my ($file, $term) = $record =~ m/^(\S+)\s+(\S+)/; $terms_in{ $file }{ $term } = 1; } close $fh1; FILE: for my $file (keys %terms_in) { open my $fh2, '<', $file; while (my $line = <$fh2>) { for my $term (keys %$terms_in{ $file }) { if (index $line, $term) { print "Term $term found in file $file\n"; # Don't search terms that have already been found... delete $terms_in{ $file }{ $term }; } } # Don't keep searching file once all terms have been found... if (scalar keys %$terms_in{ $file } == 0) { close $fh2; next FILE; } } close $fh2; } # At this point, what remains in %terms_in are all the terms that # were not found in their respective files, which might be useful. exit 0;

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1093932]
Approved by AppleFritter
Front-paged by boftx
[thezip]: Gosh my Unix is lacking these days... How can I get the line count of all CSV files contained within a directory hierarchy containing subdirectories of the CSV files?
[thezip]: Something like: xargs find . -mindepth 2 -name '*.csv' | wc -l or such?

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (10)
As of 2016-12-06 17:52 GMT
Find Nodes?
    Voting Booth?
    On a regular basis, I'm most likely to spy upon:

    Results (113 votes). Check out past polls.