http://www.perlmonks.org?node_id=832481

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

When writing a regex that has multiple patterns in it, it would be nice to know which of the patterns matched. I've looked around and can't seem to find what I'm looking for. For example:
$pat1="field"; $pat2="f.i.e.l.d"; $pat3="the"; $str = "There are many soccer fields in England - f1i2e3l4d"; while($str =~ m/($pat1|$pat2|$pat3)/ig){ print "Found '$1' from pattern ??\n"; }

What I get is:
Found 'The' from pattern ?? Found 'field' from pattern ?? Found 'f1i2e3l4d' from pattern ??
What I'd like is to find the syntax to replace ?? with the matching pattern:
Found 'The' from pattern 'the' Found 'field' from pattern 'field' Found 'f1i2e3l4d' from pattern 'f.i.e.l.d'

Any thoughts?

Replies are listed 'Best First'.
Re: Determining which pattern matched
by rubasov (Friar) on Apr 02, 2010 at 13:40 UTC
    There are several ways to achieve this, for example you can use named captures and the %+ hash:
    use strict; use warnings; my $pat1 = qr/(?<pat1>field)/; my $pat2 = qr/(?<pat2>f.i.e.l.d)/; my $pat3 = qr/(?<pat3>the)/; my $str = "There are many soccer fields in England - f1i2e3l4d"; while ( $str =~ m/($pat1|$pat2|$pat3)/ig ) { print "Found '$1' from pattern ", keys %+, "\n"; }
    Or you can use perl code embedded in regexes:
    my $pat_id; my $pat1 = qr/field(?{$pat_id=1})/; my $pat2 = qr/f.i.e.l.d(?{$pat_id=2})/; my $pat3 = qr/the(?{$pat_id=3})/; my $str = "There are many soccer fields in England - f1i2e3l4d"; while ( $str =~ m/($pat1|$pat2|$pat3)/ig ) { print "Found '$1' from pattern pat", $pat_id, "\n"; }
    Or you can use the MARK backtracking control verb and the $REGMARK variable:
    my $pat_id; our $REGMARK; my $pat1 = qr/field(*MARK:pat1)/; my $pat2 = qr/f.i.e.l.d(*MARK:pat2)/; my $pat3 = qr/the(*MARK:pat3)/; my $str = "There are many soccer fields in England - f1i2e3l4d"; while ( $str =~ m/($pat1|$pat2|$pat3)/ig ) { print "Found '$1' from pattern ", $REGMARK, "\n"; }
    Take a look at perlre these are all documented there. For starting I would recommend you the named captures, probably those are the simplest to handle.

    Hope this helps.

    update: I've just noticed that I changed the meaning of your original match, because I've used precompiled subpatterns. In that case you would like to include the /i modifier in the subpattern itself: qr/the/i

      Use local our instead of my for variables declared outside of a (?{}) and (??{}) but used inside of them. It's much safer. Your second snippet won't work if it's moved into a function, for example.

        I've just wanted to demonstrate the technic without much clutter, but you're absolutely right. I should take care to demonstrate the best practice which is using local our instead of my along with (?{ }).

        Here's a great thread explaining the reason behind this for the interested monks (including ikegami's notes): Regexes: finding ALL matches (including overlap).

Re: Determining which pattern matched
by Fletch (Bishop) on Apr 02, 2010 at 13:38 UTC

    Don't capture into a single set of parens; use a set for each chunk you want distinguishable and then look in @- and figure out which one matched.

    $ perl -MYAML::Syck=Dump -le '$_="fum";/(fee)|(fie)|(foe)|(fum)/;print + Dump( \@- );print "paren ", grep( defined $-[$_], 1..4), " matched"' --- - 0 - ~ - ~ - ~ - 0 paren 4 matched

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      No need to look into @-, only the matching pattern will defined ($1, $2, $3 ...)
Re: Determining which pattern matched
by FunkyMonk (Chancellor) on Apr 02, 2010 at 13:46 UTC
    As an alternative to the above, you could do something like...

    my @patterns = ("field", "f.i.e.l.d", "the"); my $str = "There are many soccer fields in England - f1i2e3l4d f1i2e3l +4d"; my %matches = map { $_ => [ $str =~ /$_/g ] } @patterns; __END__ { "f.i.e.l.d" => ["f1i2e3l4d", "f1i2e3l4d"], field => ["field"], the => [], }

    Unless I state otherwise, all my code runs with strict and warnings
Re: Determining which pattern matched
by BrowserUk (Patriarch) on Apr 02, 2010 at 13:52 UTC

    My variation of the named captures theme:

    #! perl -slw use strict; my $pat1 = "field"; my $pat2 = "f.i.e.l.d"; my $pat3 = "the"; my $str = "There are many soccer fields in England - f1i2e3l4d"; while( $str =~ m/((?'A'$pat1)|(?'B'$pat2)|(?'C'$pat3))/ig ){ printf "Found '$1' from pattern '%s'\n", grep{ defined $-{ $_ }[ 0 ]; } keys %-; } __END__ C:\test>junk Found 'The' from pattern 'C' Found 'field' from pattern 'A' Found 'f1i2e3l4d' from pattern 'B'

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Determining which pattern matched
by CountZero (Bishop) on Apr 02, 2010 at 15:21 UTC
    Or use Regexp::Assemble.

    And if you have many different alternatives to match from, it is probably faster too!

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Determining which pattern matched
by Anonymous Monk on Apr 05, 2010 at 22:00 UTC
    Thanks for all the responses. It looks like many use features of 5.10. Some of the boxen that will run the code are stuck on early 5.8 so I'll probably need to do use some inefficient code to achieve the result I'm looking for.

    I'm also looking forward to playing around with the 5.10 solutions.