http://www.perlmonks.org?node_id=1002833


in reply to fuzzy match: trim sequences outside of the forward and reverse primer set.

I couldn't find 'fuzzy' word in docs of Bio::Perl. Can you explain what does fuzzy match in Bio::perl (which operation is it) do and what are the criteria of trimming seq after the match?
  • Comment on Re: fuzzy match: trim sequences outside of the forward and reverse primer set.

Replies are listed 'Best First'.
Re^2: fuzzy match: trim sequences outside of the forward and reverse primer set.
by lrl1997 (Novice) on Nov 08, 2012 at 15:24 UTC

    Hi, grizzley, What I mean "fuzzy match" is that "Not a perfect match". For example, if I have a forward primer: "agct" and I want to find it in the following sequences and trim off the regions before it:

    >seq1

    aaagctcccc

    >seq2

    aaacctgggg

    if I want to perform a "Perfect match" search and trim, only seq1 contains "agct", after the trim, seq1 becomes "agctcccc", since I want to keep the primer in the sequence. we will not be able to find a perfect match to "agct" in seq2,therefore, it is untouched.

    For a "fuzzy match" search, if I allow up to 1 or 2 mismatch for "agct", the both seq1 and seq2 would trimed. seq1 contains "agct", and seq2 contains "acct" which containing 1 mistmatch by substitute "g" to "c", so after the trim, it supposed to be:

    >seq1

    agctcccc

    >seq2

    acctgggg

    but since there might be many different combinations, for 1 mismatch to "agct", it could be "acct", "ggct" etc, "Bio::Grep" can do such "fuzzy match" search, but only output sequences that contain such regions. I think it does not perform the trimming as downstream process. I do not know how to using perl to write a program to do so? I would really appreciate your help.

      In that case there are at least two possibilities:
        1. If Bio::Perl can do match with wildcards, you can do fuzzy match 'agct.*'
        2. Do fuzzy match with Bio::Perl and use returned matched string to do perfect match or better substitute: s/.*?(?=$returnedstring)//

        I don't think it returns the fuzzy matched string, but the sequence containing the string. therefore, I have no way to know what was the string found. Any more suggestions?