Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Find out which subpattern matched in regex

by Dallaylaen (Chaplain)
on Jun 14, 2013 at 12:19 UTC ( [id://1038952]=perlquestion: print w/replies, xml ) Need Help??

Dallaylaen has asked for the wisdom of the Perl Monks concerning the following question:

Suppose I have a set of regural expression, and I want to match a string against them all together and do something depending on which of those expressions did match.

If those are just words, life's easy: put them all in parentheses and see what's in $1 after match.

I think I can use named captures:

#!/usr/bin/perl -w use 5.010; # want named captures use strict; use warnings; my @reglist = ( qr/food?/, qr/b[a4]rd?/, qr/baz(o+ka)?/ ); my $i; # regex index my $giant_regex = join "|", map { $i++; "(?<r$i>$_)" } @reglist; $giant_regex = qr/($giant_regex)/; foreach (qw(foobarbaz football barcode none bazoooooka)) { $_ =~ m/$giant_regex/; my @match = keys %+; print "@match\n"; }

The above code sample works, but maybe there are simpler variants and/or variants compatible with older perls?

Replies are listed 'Best First'.
Re: Find out which subpattern matched in regex
by hdb (Monsignor) on Jun 14, 2013 at 12:51 UTC

    This requires a nested loop:

    use strict; use warnings; my %reglist = ( qr/food?/ => sub { print "\t1\n" }, qr/b[a4]rd?/ => sub { print "\t2\n" }, qr/baz(o+ka)?/ => sub { print "\t3\n" } ); foreach (qw(foobarbaz football barcode none bazoooooka)) { print "$_\n"; for my $regex (keys %reglist) { $reglist{$regex}() if /$regex/; } }

    UPDATE: If you are happy NOT to use captures in your regexes, you can just see which one has matched from the position in the matches. Due to this restriction I have changed your third regex to non-capturing parantheses.

    use warnings; my @reglist = ( qr/food?/, qr/b[a4]rd?/, qr/baz(?:o+ka)?/ ); my $giant = join( ")|(", @reglist ); foreach (qw(foobarbaz football barcode none bazoooooka)) { my @hits = /($giant)/; my @which = map { defined $hits[$_] ? $_ : "" } 0..@hits-1; print "$_: @which\n"; }

      Thanks for you comment.

      I did a quick benchmark and it turns out that nested loop solution is the fastest in most cases. In all other cases - I believe it's when regular expressions converge into a nice tree-like structure - both named captures and unnamed captures are about the same.

      Here's my code sample:

      #!/usr/bin/perl -w use 5.010; use strict; use warnings; use Test::More tests => 3; use Benchmark qw(cmpthese); my @reglist = ( qr/food?/, qr/b[a4]rd?/, qr/baz(?:o+ka)?/, 100..999); my @lines = (qw(foobarbaz b4rd perl bazooooka football)); my @expect = ("r0", "r1", "", "r2", "r0"); is_deeply(which_reg_loop(\@reglist, \@lines), \@expect, "which_reg_loo +p") and is_deeply(which_reg_capt(\@reglist, \@lines), \@expect, "which_reg_cap +t") and is_deeply(which_reg_named(\@reglist, \@lines), \@expect, "which_reg_na +med") or die "Results differ, no bench"; @lines = @lines x 1000; cmpthese ( -1, { loop => sub { which_reg_loop(\@reglist, \@lines); }, capt => sub { which_reg_capt(\@reglist, \@lines); }, named => sub { which_reg_named(\@reglist, \@lines); }, }); sub which_reg_loop { my ($reglist, $lines) = @_; my @ret; LINE: foreach my $str (@$lines) { for (my $i = 0; $i < @$reglist; $i++) { $str =~ $reglist->[$i] or next; push @ret, "r$i"; next LINE; }; push @ret, ''; }; return \@ret; }; sub which_reg_capt { my ($reglist, $lines) = @_; my $giant = join "|", map { "($_)" } @$reglist; $giant = qr($giant); my @ret; LINE: foreach (@$lines) { my @hits = $_ =~ $giant; for (my $i = 0; $i < @hits; $i++) { $hits[$i] or next; push @ret, "r$i"; next LINE; }; push @ret, ''; }; return \@ret; }; sub which_reg_named { my ($reglist, $lines) = @_; my $giant = join "|", map { "(?<r$_>$reglist->[$_])" } 0..$#$regli +st; $giant = qr($giant); my @ret = map { $_ =~ $giant ? (keys %+) : '' } @$lines; return \@ret; };
Re: Find out which subpattern matched in regex
by space_monk (Chaplain) on Jun 14, 2013 at 12:26 UTC

    You can use "given/when" constructs to match against a regex array from 5.10 on

    given ($_) { when (@reglist) { # you have a match } default { # you don't have a match } }
    If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)
Re: Find out which subpattern matched in regex
by Anonymous Monk on Jun 14, 2013 at 12:37 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1038952]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-04-24 08:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found