Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

fuzzy match: trim sequences outside of the forward and reverse primer set.

by lrl1997 (Novice)
on Nov 08, 2012 at 00:50 UTC ( #1002781=perlquestion: print w/ replies, xml ) Need Help??
lrl1997 has asked for the wisdom of the Perl Monks concerning the following question:

Dear all,

Basically I want to trim DNA sequences using forward and reverse primer set. The trimmed sequences still contain the forward and reverse primer but what before and after would have been trimmed off

Two strategies would be: 1) "Perfect match then trim": i.e. find the location in the sequences that are perfect match to forward and reverse primers, then do the trim; I wrote a script can do it very well; 2) "Fuzzy match then trim", i.e. when searching the primer matching strings in the sequences, allow up to 2 mismatches, and then trim the sequence; I have a difficult time to do so; Bio::Grep can perform the fuzzy search, but how should I trim the seq and print the trimmed sequence out?

Thank you all for any suggestions.

Comment on fuzzy match: trim sequences outside of the forward and reverse primer set.
Re: fuzzy match: trim sequences outside of the forward and reverse primer set.
by Anonymous Monk on Nov 08, 2012 at 02:25 UTC

    Thank you all for any suggestions.

    Hire a programmer :)

Re: fuzzy match: trim sequences outside of the forward and reverse primer set.
by grizzley (Chaplain) on Nov 08, 2012 at 08:29 UTC
    I couldn't find 'fuzzy' word in docs of Bio::Perl. Can you explain what does fuzzy match in Bio::perl (which operation is it) do and what are the criteria of trimming seq after the match?

      Hi, grizzley, What I mean "fuzzy match" is that "Not a perfect match". For example, if I have a forward primer: "agct" and I want to find it in the following sequences and trim off the regions before it:

      >seq1

      aaagctcccc

      >seq2

      aaacctgggg

      if I want to perform a "Perfect match" search and trim, only seq1 contains "agct", after the trim, seq1 becomes "agctcccc", since I want to keep the primer in the sequence. we will not be able to find a perfect match to "agct" in seq2,therefore, it is untouched.

      For a "fuzzy match" search, if I allow up to 1 or 2 mismatch for "agct", the both seq1 and seq2 would trimed. seq1 contains "agct", and seq2 contains "acct" which containing 1 mistmatch by substitute "g" to "c", so after the trim, it supposed to be:

      >seq1

      agctcccc

      >seq2

      acctgggg

      but since there might be many different combinations, for 1 mismatch to "agct", it could be "acct", "ggct" etc, "Bio::Grep" can do such "fuzzy match" search, but only output sequences that contain such regions. I think it does not perform the trimming as downstream process. I do not know how to using perl to write a program to do so? I would really appreciate your help.

        In that case there are at least two possibilities:
          1. If Bio::Perl can do match with wildcards, you can do fuzzy match 'agct.*'
          2. Do fuzzy match with Bio::Perl and use returned matched string to do perfect match or better substitute: s/.*?(?=$returnedstring)//

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1002781]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (16)
As of 2014-09-19 14:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (140 votes), past polls