### Re^11: Comparing 2 different-sized strings

by BrowserUk (Pope)
 on Aug 18, 2013 at 13:41 UTC

in reply to Re^10: Comparing 2 different-sized strings
in thread Comparing 2 different-sized strings

If I am searching for 2 sequences within the same haystack, and what separates the 2 sequences is always a "T" followed by one other nucleotide (either A,G,C,or T),

Could you explain that a bit more?

I get that you are looking for ???...????T[acgt]???..???; but that criteria will match everywhere a T occurs in a sequence, other than if it is the first, or second or third last, characters in the sequence. And without some constraints on the lengths of the pre & post T sequence length, there would be multiple (100s or 1000s or millions) possible matches at every T position.

Re^12: Comparing 2 different-sized strings
by AdrianJ217 (Novice) on Aug 18, 2013 at 16:19 UTC
Hi, I'm not just looking for the T. For example, if I have the following sequence:
```\$hay = AACCCAGGATGCGCCATGCAGGACACAGGACGCCACGGAA
\$nee1 = AGGA
\$nee2 = CGCCAC
What I want is the following in regular expression:
```\$hay =~ /(\$nee1)T[ATGC](\$nee2)/
So I only want \$nee1 when it is directly followed by a T, some other nucleotide and \$nee2. I don't want \$nee1 and \$nee2 anywhere else.
Then use the regular expression. It is perfect for that usage.

fuzzyMatch() is not designed for that type of matching.

I would use the regular expression except the problem is that there may be up to 2 mismatches in \$nee1 and/or \$nee2. In that case I need to use the fuzzyMatch subroutine don't I? Is there a way to incorporate the fuzzyMatch subroutine, which is perfect for mismatches, into the regular expression?

