http://www.perlmonks.org?node_id=1002923


in reply to Re: fuzzy match: trim sequences outside of the forward and reverse primer set.
in thread fuzzy match: trim sequences outside of the forward and reverse primer set.

Hi, grizzley, What I mean "fuzzy match" is that "Not a perfect match". For example, if I have a forward primer: "agct" and I want to find it in the following sequences and trim off the regions before it:

>seq1

aaagctcccc

>seq2

aaacctgggg

if I want to perform a "Perfect match" search and trim, only seq1 contains "agct", after the trim, seq1 becomes "agctcccc", since I want to keep the primer in the sequence. we will not be able to find a perfect match to "agct" in seq2,therefore, it is untouched.

For a "fuzzy match" search, if I allow up to 1 or 2 mismatch for "agct", the both seq1 and seq2 would trimed. seq1 contains "agct", and seq2 contains "acct" which containing 1 mistmatch by substitute "g" to "c", so after the trim, it supposed to be:

>seq1

agctcccc

>seq2

acctgggg

but since there might be many different combinations, for 1 mismatch to "agct", it could be "acct", "ggct" etc, "Bio::Grep" can do such "fuzzy match" search, but only output sequences that contain such regions. I think it does not perform the trimming as downstream process. I do not know how to using perl to write a program to do so? I would really appreciate your help.

  • Comment on Re^2: fuzzy match: trim sequences outside of the forward and reverse primer set.

Replies are listed 'Best First'.
Re^3: fuzzy match: trim sequences outside of the forward and reverse primer set.
by grizzley (Chaplain) on Nov 09, 2012 at 07:46 UTC
    In that case there are at least two possibilities:
      1. If Bio::Perl can do match with wildcards, you can do fuzzy match 'agct.*'
      2. Do fuzzy match with Bio::Perl and use returned matched string to do perfect match or better substitute: s/.*?(?=$returnedstring)//

      I don't think it returns the fuzzy matched string, but the sequence containing the string. therefore, I have no way to know what was the string found. Any more suggestions?

        Not much. I though perl package will behave in perl-ish way returning matched part of the string as well as storing somewhere "before-string" and "after-string". What remains is to implement the fuzzy-matching yourself.

        If this fuzzy matching would be defined by number of differences between strings then maybe Text::Levenshtein is of use? I mean iterating in simple

        for (0..length($str)-$len_of_match) { if(Text::Levenshtein::distance(substr($str, $_, $len_of_match), matc +hstring) <= $differences_limit) { $found = true; last } }