http://www.perlmonks.org?node_id=948417

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I have a problem I am trying to solve with regular expressions. I want to return all locations of an array of patterns inside a string. I also only want the patterns to match at locations which are a multiple of 3. For example, so far I have the following.

my $string="AAABBBCCCCDDDEEEFFFGGGHHHIII"; my @patterns=('BBB','DDD'); my @index; foreach(@patterns){ while($string=~m/([A-Z]{3})+?$_/g){ push(@index,$-[2]); pos($string)=$-[2]; }

The problem with this is that it matches the first pattern correctly, but it will also match 'CCCDDD' in the second loop because I cannot/dont know how to anchor the pattern to start at a specific location.

To be clear, I only want the patterns in the array to match if there are a number of characters evenly divisible by three before it.

Is there a good way to do this?

Thanks

Replies are listed 'Best First'.
Re: Perl Regex Repeating Patterns
by Corion (Patriarch) on Jan 17, 2012 at 21:21 UTC

    Having a quantifier after capturing parentheses is almost always an error.

    Maybe you want to directly populate your list?

    my @matches = ($string =~ /([A-Z]{3})/g;

    but that approach will not work if your input data contains non-alphabetical chars, like:

    AAA..BBB

    If you want a match to start at a specific position, see perlre on \G.

      Thanks for the response. The following regex worked properly. I realize that using a quantifier after a capture group is bad programming practice, but I do not know how else to do what I want

      my $string="AAABBBCCCCDDDEEEFFFGGGHHHIII"; my @patterns=('BBB','DDD'); my @index; foreach(@patterns){ while($string=~m/\G([A-Z]{3})+?$_/g){ push(@index,$-[2]); pos($string)=$-[2]; }
      If there is a better way to do this, please let me know. Thanks.
        The following regex worked properly. ... If there is a better way to do this ...

        Even after adding (just for the heck of it) a closing  } brace, all I get is an infinite loop. What code are you running?

        >perl -wMstrict -le "my $string=\"AAABBBCCCCDDDEEEFFFGGGHHHIII\"; my @patterns=('BBB','DDD'); my @index; foreach(@patterns){ while($string=~m/\G([A-Z]{3})+?$_/g){ push(@index,$-[2]); pos($string)=$-[2]; print 0+@index; <STDIN>; } } " 1 2 3 4 5 ... Terminating on signal SIGINT(2)

        Putting a repeating quantifier after a capture simply does not do what you intend it to do, unless you only want to capture the last matched item in the repetition. And even then, it would be clearer in my opinion to explicitly state your assumption, by explicitly discarding the first repetitions:

        /(?:[A-Z]{3})*([A-Z]{3})$/
Re: Perl Regex Repeating Patterns
by JavaFan (Canon) on Jan 17, 2012 at 22:17 UTC
    Untested:
    foreach my $pattern (@patterns) { for (my $i = 0; $i < length($string); $i += 3) { push @matches, ${^MATCH} if substr($string, $i) =~ /^$pattern/p; } }
    It shouldn't be hard to adapt if you want the offsets (as your original code does). It does capture overlapping matches, although with the given patterns, no overlap is possible.
Re: Perl Regex Repeating Patterns
by Anonymous Monk on Jan 17, 2012 at 21:08 UTC
    Sorry, I am missing a } at the end of the code. This is not the problem I am having.