Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Perl Regex Repeating Patterns

by Anonymous Monk
on Jan 17, 2012 at 21:06 UTC ( #948417=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I have a problem I am trying to solve with regular expressions. I want to return all locations of an array of patterns inside a string. I also only want the patterns to match at locations which are a multiple of 3. For example, so far I have the following.

my $string="AAABBBCCCCDDDEEEFFFGGGHHHIII"; my @patterns=('BBB','DDD'); my @index; foreach(@patterns){ while($string=~m/([A-Z]{3})+?$_/g){ push(@index,$-[2]); pos($string)=$-[2]; }

The problem with this is that it matches the first pattern correctly, but it will also match 'CCCDDD' in the second loop because I cannot/dont know how to anchor the pattern to start at a specific location.

To be clear, I only want the patterns in the array to match if there are a number of characters evenly divisible by three before it.

Is there a good way to do this?

Thanks

Comment on Perl Regex Repeating Patterns
Download Code
Re: Perl Regex Repeating Patterns
by Anonymous Monk on Jan 17, 2012 at 21:08 UTC
    Sorry, I am missing a } at the end of the code. This is not the problem I am having.
Re: Perl Regex Repeating Patterns
by Corion (Pope) on Jan 17, 2012 at 21:21 UTC

    Having a quantifier after capturing parentheses is almost always an error.

    Maybe you want to directly populate your list?

    my @matches = ($string =~ /([A-Z]{3})/g;

    but that approach will not work if your input data contains non-alphabetical chars, like:

    AAA..BBB

    If you want a match to start at a specific position, see perlre on \G.

      Thanks for the response. The following regex worked properly. I realize that using a quantifier after a capture group is bad programming practice, but I do not know how else to do what I want

      my $string="AAABBBCCCCDDDEEEFFFGGGHHHIII"; my @patterns=('BBB','DDD'); my @index; foreach(@patterns){ while($string=~m/\G([A-Z]{3})+?$_/g){ push(@index,$-[2]); pos($string)=$-[2]; }
      If there is a better way to do this, please let me know. Thanks.
        The following regex worked properly. ... If there is a better way to do this ...

        Even after adding (just for the heck of it) a closing  } brace, all I get is an infinite loop. What code are you running?

        >perl -wMstrict -le "my $string=\"AAABBBCCCCDDDEEEFFFGGGHHHIII\"; my @patterns=('BBB','DDD'); my @index; foreach(@patterns){ while($string=~m/\G([A-Z]{3})+?$_/g){ push(@index,$-[2]); pos($string)=$-[2]; print 0+@index; <STDIN>; } } " 1 2 3 4 5 ... Terminating on signal SIGINT(2)

        Putting a repeating quantifier after a capture simply does not do what you intend it to do, unless you only want to capture the last matched item in the repetition. And even then, it would be clearer in my opinion to explicitly state your assumption, by explicitly discarding the first repetitions:

        /(?:[A-Z]{3})*([A-Z]{3})$/
Re: Perl Regex Repeating Patterns
by JavaFan (Canon) on Jan 17, 2012 at 22:17 UTC
    Untested:
    foreach my $pattern (@patterns) { for (my $i = 0; $i < length($string); $i += 3) { push @matches, ${^MATCH} if substr($string, $i) =~ /^$pattern/p; } }
    It shouldn't be hard to adapt if you want the offsets (as your original code does). It does capture overlapping matches, although with the given patterns, no overlap is possible.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://948417]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (11)
As of 2014-10-23 16:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (126 votes), past polls