Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

fuzzy match: trim sequences outside of the forward and reverse primer set.

by lrl1997 (Novice)
on Nov 08, 2012 at 00:50 UTC ( #1002781=perlquestion: print w/replies, xml ) Need Help??
lrl1997 has asked for the wisdom of the Perl Monks concerning the following question:

Dear all,

Basically I want to trim DNA sequences using forward and reverse primer set. The trimmed sequences still contain the forward and reverse primer but what before and after would have been trimmed off

Two strategies would be: 1) "Perfect match then trim": i.e. find the location in the sequences that are perfect match to forward and reverse primers, then do the trim; I wrote a script can do it very well; 2) "Fuzzy match then trim", i.e. when searching the primer matching strings in the sequences, allow up to 2 mismatches, and then trim the sequence; I have a difficult time to do so; Bio::Grep can perform the fuzzy search, but how should I trim the seq and print the trimmed sequence out?

Thank you all for any suggestions.

  • Comment on fuzzy match: trim sequences outside of the forward and reverse primer set.

Replies are listed 'Best First'.
Re: fuzzy match: trim sequences outside of the forward and reverse primer set.
by Anonymous Monk on Nov 08, 2012 at 02:25 UTC

    Thank you all for any suggestions.

    Hire a programmer :)

Re: fuzzy match: trim sequences outside of the forward and reverse primer set.
by grizzley (Chaplain) on Nov 08, 2012 at 08:29 UTC
    I couldn't find 'fuzzy' word in docs of Bio::Perl. Can you explain what does fuzzy match in Bio::perl (which operation is it) do and what are the criteria of trimming seq after the match?

      Hi, grizzley, What I mean "fuzzy match" is that "Not a perfect match". For example, if I have a forward primer: "agct" and I want to find it in the following sequences and trim off the regions before it:





      if I want to perform a "Perfect match" search and trim, only seq1 contains "agct", after the trim, seq1 becomes "agctcccc", since I want to keep the primer in the sequence. we will not be able to find a perfect match to "agct" in seq2,therefore, it is untouched.

      For a "fuzzy match" search, if I allow up to 1 or 2 mismatch for "agct", the both seq1 and seq2 would trimed. seq1 contains "agct", and seq2 contains "acct" which containing 1 mistmatch by substitute "g" to "c", so after the trim, it supposed to be:





      but since there might be many different combinations, for 1 mismatch to "agct", it could be "acct", "ggct" etc, "Bio::Grep" can do such "fuzzy match" search, but only output sequences that contain such regions. I think it does not perform the trimming as downstream process. I do not know how to using perl to write a program to do so? I would really appreciate your help.

        In that case there are at least two possibilities:
          1. If Bio::Perl can do match with wildcards, you can do fuzzy match 'agct.*'
          2. Do fuzzy match with Bio::Perl and use returned matched string to do perfect match or better substitute: s/.*?(?=$returnedstring)//

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1002781]
[holli]: silent night, perl 6 night
[holli]: is there anybody out there?
[shmem]: the world is full of people - out there...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2017-09-22 20:43 GMT
Find Nodes?
    Voting Booth?
    During the recent solar eclipse, I:

    Results (269 votes). Check out past polls.