in reply to self limiting regex help

If your string is allowed to be empty, try this:
print "match" if $string =~ /[atcg]*([RYMKSWHBVDN][atcg]*){0,2}/i;
If not, try this:
print "match" if $string and $string =~ /[atcg]*([RYMKSWHBVDN][atcg]*) +{0,2}/i;
(Warning, neither was tested)

Update:Sorry about that: You're both right, I misunderstood the initial request. The regex above matches up to 2 occurrances of any of the alternate codes, not any number of occurrances of up to 2 of the alternate codes.

You could do what you want with an extremely long regex which enumerates every combination of 2 alternate codes. That's a really bad answer, though: it would be much shorter and more straightforward to do it with regexes supplemented by other perl code.

Sorry for the wrong answer :)


Replies are listed 'Best First'.
Re: Re: self limiting regex help
by spq (Friar) on May 22, 2002 at 15:34 UTC

    Hmm, that looks like it would match fine. But I don't see how it would limit a string to containing any number of occurances of only two of the alternate codes?

    In case I wasn't clear in my first posting, the regex should match on a string that is within the QC criteria, but fail if not. So:


    Would be matched, but:


    Wouldn't, because the N near the end introduces a third ambiguity code.

Re: Re: self limiting regex help
by dsheroh (Monsignor) on May 22, 2002 at 15:37 UTC
    That doesn't solve the any number of no more than two different characters part. Both solutions posted so far will match up to 2 non-atcg characters, but I read the spec as allowing for, say 50 Rs and 100 Ws to be mixed in, but not one each of Y, H, and N.

    I could be wrong, but I have a feeling that this one can't be solved by a single regex. (Or at least not one written by a mere mortal - there are some regex deities floating around here who might prove me wrong...)