Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

similar string matching

by Murcia (Monk)
on Jul 05, 2004 at 09:11 UTC ( #371826=perlquestion: print w/ replies, xml ) Need Help??
Murcia has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I have at my work a little tricky task. It is a string matching problem.

I want to compare two amino acid sequences (here in Letter code, single char represents one amino acid) to find at which positions one sequence lies in the other! example?

MAAGAAAAFAAAATTTTTTTTFTTTTTTTTTTTTAAAAEAAAARAAAAAA # 1. sequence TTTTTTTTFTTTTTTTTTTTT # 2. sequence
result is: 2. lies at position 14 to 34 in 1.

simple? (for this I need no help!)
new examples

SUBSTITUTION AAAAEAAAARGAAATTTTFTTTTTTTTTTTTTTTTAAAAAAAAILVAAAAAAAA # 1. sequence TTTTFTTTATTTTTTDTTTTT # 2. sequence DELETION AAAAAAAAAAAAATTGTTTTTTTXXXXXTTTTTTTTTTMAAAAAAAAAAAAAAAA # 1. sequence TTGTTTTTTTTTTTTTTTTTM # 2. sequence REVERSE TTTTTTTTTTTTTTTTTTTT # 1. sequence AAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTAAAAAAAAAAAAAAAA # 2. sequence PERFECT MATCHING ONLY AT BEGIN AND END OF 2. SEQUENCE AAAAAAAAAAATTTTTTTTGGGGGGGGGGGGGGGGGGGGGTTTTTTTTTAAAAAAA # 1.sequence TTTTTTTTGGGNNGGGEEGGGEGGGGGGTTTTTTTTT # 2. Sequence

I tried with the regexp and the module String::Approx and aslice with the option 'minimal_distance', but I don't like the return values for this module.

Any hints how to do "the best way"?

Murcia

Edited by Chady -- fixed formatting.

Comment on similar string matching
Select or Download Code
Re: similar string matching
by Anonymous Monk on Jul 05, 2004 at 09:23 UTC

    Have you tried with index?

    $s="MAAGAAAAFAAAATTTTTTTTFTTTTTTTTTTTTAAAAEAAAARAAAAAA"; $f="TTTTTTTTFTTTTTTTTTTTT"; print index($s,$f); # gives 13
Re: similar string matching
by Crian (Chaplain) on Jul 05, 2004 at 10:02 UTC
    I think I need a few more informations.

    For SUBSTITUTION: Do I get the part to substitute as a parameter or do I have to guess it (starts with TT and ands with TT or something like this)?

    Same question for DELETION: Do I have to guess whats going to be deleted?

    How long or better how small may the matching parts at the beginning and the ending be for a successfull return?

    Please describe your needs a little more exactly.
      The successfull return is a good question! It is quite difficult to infine. I want the best values on precision and recall. DELETIONS: I think that minimum 5 amino acid at both end are ok of a sucessfull return. Murcia

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://371826]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2014-07-13 05:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (247 votes), past polls