Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^2: fuzzy match: trim sequences outside of the forward and reverse primer set.

by lrl1997 (Novice)
on Nov 08, 2012 at 15:24 UTC ( #1002923=note: print w/ replies, xml ) Need Help??


in reply to Re: fuzzy match: trim sequences outside of the forward and reverse primer set.
in thread fuzzy match: trim sequences outside of the forward and reverse primer set.

Hi, grizzley, What I mean "fuzzy match" is that "Not a perfect match". For example, if I have a forward primer: "agct" and I want to find it in the following sequences and trim off the regions before it:

>seq1

aaagctcccc

>seq2

aaacctgggg

if I want to perform a "Perfect match" search and trim, only seq1 contains "agct", after the trim, seq1 becomes "agctcccc", since I want to keep the primer in the sequence. we will not be able to find a perfect match to "agct" in seq2,therefore, it is untouched.

For a "fuzzy match" search, if I allow up to 1 or 2 mismatch for "agct", the both seq1 and seq2 would trimed. seq1 contains "agct", and seq2 contains "acct" which containing 1 mistmatch by substitute "g" to "c", so after the trim, it supposed to be:

>seq1

agctcccc

>seq2

acctgggg

but since there might be many different combinations, for 1 mismatch to "agct", it could be "acct", "ggct" etc, "Bio::Grep" can do such "fuzzy match" search, but only output sequences that contain such regions. I think it does not perform the trimming as downstream process. I do not know how to using perl to write a program to do so? I would really appreciate your help.


Comment on Re^2: fuzzy match: trim sequences outside of the forward and reverse primer set.
Re^3: fuzzy match: trim sequences outside of the forward and reverse primer set.
by grizzley (Chaplain) on Nov 09, 2012 at 07:46 UTC
    In that case there are at least two possibilities:
      1. If Bio::Perl can do match with wildcards, you can do fuzzy match 'agct.*'
      2. Do fuzzy match with Bio::Perl and use returned matched string to do perfect match or better substitute: s/.*?(?=$returnedstring)//

      I don't think it returns the fuzzy matched string, but the sequence containing the string. therefore, I have no way to know what was the string found. Any more suggestions?

        Not much. I though perl package will behave in perl-ish way returning matched part of the string as well as storing somewhere "before-string" and "after-string". What remains is to implement the fuzzy-matching yourself.

        If this fuzzy matching would be defined by number of differences between strings then maybe Text::Levenshtein is of use? I mean iterating in simple

        for (0..length($str)-$len_of_match) { if(Text::Levenshtein::distance(substr($str, $_, $len_of_match), matc +hstring) <= $differences_limit) { $found = true; last } }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1002923]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2015-07-05 05:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (60 votes), past polls