Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: fuzzy match: trim sequences outside of the forward and reverse primer set.

by grizzley (Chaplain)
on Nov 08, 2012 at 08:29 UTC ( #1002833=note: print w/ replies, xml ) Need Help??


in reply to fuzzy match: trim sequences outside of the forward and reverse primer set.

I couldn't find 'fuzzy' word in docs of Bio::Perl. Can you explain what does fuzzy match in Bio::perl (which operation is it) do and what are the criteria of trimming seq after the match?


Comment on Re: fuzzy match: trim sequences outside of the forward and reverse primer set.
Replies are listed 'Best First'.
Re^2: fuzzy match: trim sequences outside of the forward and reverse primer set.
by lrl1997 (Novice) on Nov 08, 2012 at 15:24 UTC

    Hi, grizzley, What I mean "fuzzy match" is that "Not a perfect match". For example, if I have a forward primer: "agct" and I want to find it in the following sequences and trim off the regions before it:

    >seq1

    aaagctcccc

    >seq2

    aaacctgggg

    if I want to perform a "Perfect match" search and trim, only seq1 contains "agct", after the trim, seq1 becomes "agctcccc", since I want to keep the primer in the sequence. we will not be able to find a perfect match to "agct" in seq2,therefore, it is untouched.

    For a "fuzzy match" search, if I allow up to 1 or 2 mismatch for "agct", the both seq1 and seq2 would trimed. seq1 contains "agct", and seq2 contains "acct" which containing 1 mistmatch by substitute "g" to "c", so after the trim, it supposed to be:

    >seq1

    agctcccc

    >seq2

    acctgggg

    but since there might be many different combinations, for 1 mismatch to "agct", it could be "acct", "ggct" etc, "Bio::Grep" can do such "fuzzy match" search, but only output sequences that contain such regions. I think it does not perform the trimming as downstream process. I do not know how to using perl to write a program to do so? I would really appreciate your help.

      In that case there are at least two possibilities:
        1. If Bio::Perl can do match with wildcards, you can do fuzzy match 'agct.*'
        2. Do fuzzy match with Bio::Perl and use returned matched string to do perfect match or better substitute: s/.*?(?=$returnedstring)//

        I don't think it returns the fuzzy matched string, but the sequence containing the string. therefore, I have no way to know what was the string found. Any more suggestions?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1002833]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (14)
As of 2015-07-29 18:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (267 votes), past polls