Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Find out which subpattern matched in regex

by Dallaylaen (Scribe)
on Jun 14, 2013 at 12:19 UTC ( #1038952=perlquestion: print w/ replies, xml ) Need Help??
Dallaylaen has asked for the wisdom of the Perl Monks concerning the following question:

Suppose I have a set of regural expression, and I want to match a string against them all together and do something depending on which of those expressions did match.

If those are just words, life's easy: put them all in parentheses and see what's in $1 after match.

I think I can use named captures:

#!/usr/bin/perl -w use 5.010; # want named captures use strict; use warnings; my @reglist = ( qr/food?/, qr/b[a4]rd?/, qr/baz(o+ka)?/ ); my $i; # regex index my $giant_regex = join "|", map { $i++; "(?<r$i>$_)" } @reglist; $giant_regex = qr/($giant_regex)/; foreach (qw(foobarbaz football barcode none bazoooooka)) { $_ =~ m/$giant_regex/; my @match = keys %+; print "@match\n"; }

The above code sample works, but maybe there are simpler variants and/or variants compatible with older perls?

Comment on Find out which subpattern matched in regex
Download Code
Re: Find out which subpattern matched in regex
by space_monk (Chaplain) on Jun 14, 2013 at 12:26 UTC

    You can use "given/when" constructs to match against a regex array from 5.10 on

    given ($_) { when (@reglist) { # you have a match } default { # you don't have a match } }
    If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)
Re: Find out which subpattern matched in regex
by Anonymous Monk on Jun 14, 2013 at 12:37 UTC
Re: Find out which subpattern matched in regex
by hdb (Parson) on Jun 14, 2013 at 12:51 UTC

    This requires a nested loop:

    use strict; use warnings; my %reglist = ( qr/food?/ => sub { print "\t1\n" }, qr/b[a4]rd?/ => sub { print "\t2\n" }, qr/baz(o+ka)?/ => sub { print "\t3\n" } ); foreach (qw(foobarbaz football barcode none bazoooooka)) { print "$_\n"; for my $regex (keys %reglist) { $reglist{$regex}() if /$regex/; } }

    UPDATE: If you are happy NOT to use captures in your regexes, you can just see which one has matched from the position in the matches. Due to this restriction I have changed your third regex to non-capturing parantheses.

    use warnings; my @reglist = ( qr/food?/, qr/b[a4]rd?/, qr/baz(?:o+ka)?/ ); my $giant = join( ")|(", @reglist ); foreach (qw(foobarbaz football barcode none bazoooooka)) { my @hits = /($giant)/; my @which = map { defined $hits[$_] ? $_ : "" } 0..@hits-1; print "$_: @which\n"; }

      Thanks for you comment.

      I did a quick benchmark and it turns out that nested loop solution is the fastest in most cases. In all other cases - I believe it's when regular expressions converge into a nice tree-like structure - both named captures and unnamed captures are about the same.

      Here's my code sample:

      #!/usr/bin/perl -w use 5.010; use strict; use warnings; use Test::More tests => 3; use Benchmark qw(cmpthese); my @reglist = ( qr/food?/, qr/b[a4]rd?/, qr/baz(?:o+ka)?/, 100..999); my @lines = (qw(foobarbaz b4rd perl bazooooka football)); my @expect = ("r0", "r1", "", "r2", "r0"); is_deeply(which_reg_loop(\@reglist, \@lines), \@expect, "which_reg_loo +p") and is_deeply(which_reg_capt(\@reglist, \@lines), \@expect, "which_reg_cap +t") and is_deeply(which_reg_named(\@reglist, \@lines), \@expect, "which_reg_na +med") or die "Results differ, no bench"; @lines = @lines x 1000; cmpthese ( -1, { loop => sub { which_reg_loop(\@reglist, \@lines); }, capt => sub { which_reg_capt(\@reglist, \@lines); }, named => sub { which_reg_named(\@reglist, \@lines); }, }); sub which_reg_loop { my ($reglist, $lines) = @_; my @ret; LINE: foreach my $str (@$lines) { for (my $i = 0; $i < @$reglist; $i++) { $str =~ $reglist->[$i] or next; push @ret, "r$i"; next LINE; }; push @ret, ''; }; return \@ret; }; sub which_reg_capt { my ($reglist, $lines) = @_; my $giant = join "|", map { "($_)" } @$reglist; $giant = qr($giant); my @ret; LINE: foreach (@$lines) { my @hits = $_ =~ $giant; for (my $i = 0; $i < @hits; $i++) { $hits[$i] or next; push @ret, "r$i"; next LINE; }; push @ret, ''; }; return \@ret; }; sub which_reg_named { my ($reglist, $lines) = @_; my $giant = join "|", map { "(?<r$_>$reglist->[$_])" } 0..$#$regli +st; $giant = qr($giant); my @ret = map { $_ =~ $giant ? (keys %+) : '' } @$lines; return \@ret; };

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1038952]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (8)
As of 2014-07-31 21:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (253 votes), past polls