Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

regex with arrays and variables

by xorl (Deacon)
on Apr 25, 2013 at 17:52 UTC ( #1030711=perlquestion: print w/replies, xml ) Need Help??
xorl has asked for the wisdom of the Perl Monks concerning the following question:

I have two arrays. The first array contains several strings. The second array contains elements, each of which might be a substring of an element in the first array.

Now I want to figure out which of the elements of the first array match the substrings in the 2nd array and do something with the other elements of the 1st array.

My code:

#!/usr/bin/perl use strict; my @array = ("foo {abc123}\n", "bar {def456}\n", "baz {ghi789}\n"); my @array2 = ("foo", "bar", "quux"); foreach my $i (@array) { chomp $i; print "checking $i\n"; if ($i =~ m/@array2/) { print "$i is in array2 - skipping\n"; next; } # Do something with $i now. }

Of course that didn't work

So I then tried adding this before the loop:

my $regex = "("; foreach (@array2) { $regex .= $_ . "|"; } $regex .= ")";
and then changing the if to:
if ($i =~ m/$regex/) {

And that matched every element although it shouldn't have matched the last one (I really am perplexed by this)

So now I'm at my last resort and putting an inner loop which goes through array2 and checks each element

#!/usr/bin/perl use strict; my @array = ("foo {abc123}\n", "bar {def456}\n", "baz {ghi789}\n"); my @array2 = ("foo", "bar", "quux"); OUTTER: foreach my $i (@array) { chomp $i; print "checking $i\n"; foreach my $j (@array2) { if ($i =~ m/$j/) { print "$i is in array2 (matches $j) - skipping\n"; next OUTTER; } } print "$i IS NOT IN ARRAY 2\n"; # Do something with $i now. }

So is this really the only way to do it? I can forsee having a very very very large array2 in the not too distant future. I don't think this code will scale well with that (plus the extra loop to me makes the code hard to read). Anyway just hoping there's a better way.

Thanks in advance

Edit:Thanks to everyone who found the problem with the $regex

Replies are listed 'Best First'.
Re: regex with arrays and variables
by jwkrahn (Monsignor) on Apr 25, 2013 at 18:19 UTC
    if ($i =~ m/@array2/) { print "$i is in array2 - skipping\n";

    That regular expression is m/foo bar quux/ and english for that is "array2 is in $i - skipping\n".    It won't work because you are looking for the string "foo bar quux".

    my $regex = "("; foreach (@array2) { $regex .= $_ . "|"; } $regex .= ")";

    $regex now contains the string "(foo|bar|quux|)" which says to match EITHER "foo" OR "bar" OR "quux" OR "", and EVERY string will match "".

    You need something like:

    my $regex = "("; $regex .= join "|", @array2; $regex .= ")";
Re: regex with arrays and variables
by NetWallah (Canon) on Apr 25, 2013 at 18:20 UTC
    Your regex concat command creates a regex that looks like:
    That trailing "|" allows an empty match.

    This is why all elements matched.

    Try removing the trailing "|". This is the canonical way to match a bunch of different things. Also, you do not need the parens, unless you are capturing.

    Lastly, use a "join" instead of looping and concatenating - that will also avoid the trailing "|".

                 "I'm fairly sure if they took porn off the Internet, there'd only be one website left, and it'd be called 'Bring Back the Porn!'"
            -- Dr. Cox, Scrubs

Re: regex with arrays and variables
by moritz (Cardinal) on Apr 25, 2013 at 18:22 UTC
    my $regex = "("; foreach (@array2) { $regex .= $_ . "|"; } $regex .= ")";

    You're nearly there. The problem with this code is that it produces something like (foo|bar|quox|), and the trailing pipe symbol makes the empty string match. To fix that, use

    my $regex = join '|', @array2;

    If you don't want regex meta characters in @array2 to act specially, you should even write

    my $regex = join '|', map quotemeta, @array2;
Re: regex with arrays and variables
by Random_Walk (Prior) on Apr 26, 2013 at 08:18 UTC

    If you have a lot of regex to use, and want to know which one matched, you can speed things up by pre-compiling them.

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my @array = ("foo {abc123}\n", "bar {def456}\n", "baz {ghi789}\n"); my @patterns = ("foo", "bar", "quux"); # pre compile regex my %regex; for (@patterns) { $regex{$_}=qr/$_/; } for my $i (@array) { print "checking $i"; # was chomping then adding a \n" for (keys %regex) { if ($i =~ $regex{$_}) { print "$i is in patterns (matches $_) - skipping\n"; last; } } print "$i IS NOT IN PATTERNS\n"; # Do something with $i now. }


    Pereant, qui ante nos nostra dixerunt!
Re: regex with arrays and variables
by Rahul6990 (Beadle) on Apr 26, 2013 at 06:29 UTC

    Shorter and working the same.
    You do not require to add "(" and ")" to your regex.
    my @array = ("foo {abc123}\n", "bar {def456}\n", "baz {ghi789}\n"); my @array2 = ("foo", "bar", "quux"); foreach my $i (@array) { chomp $i; $regex .= join "|", @array2; if ($i =~ m/$regex/) { print "$i is in array2\n"; next; } # Do something with $i now. }

    Happy Programming...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1030711]
Approved by Old_Gray_Bear
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2018-06-25 06:56 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (126 votes). Check out past polls.