Murcia has asked for the wisdom of the Perl Monks concerning the following question:
I have at my work a little tricky task. It is a string matching problem.
I want to compare two amino acid sequences (here in Letter code, single char represents one amino acid) to find at which positions one sequence lies in the other! example?
result is: 2. lies at position 14 to 34 in 1.MAAGAAAAFAAAATTTTTTTTFTTTTTTTTTTTTAAAAEAAAARAAAAAA # 1. sequence TTTTTTTTFTTTTTTTTTTTT # 2. sequence
simple? (for this I need no help!)
SUBSTITUTION AAAAEAAAARGAAATTTTFTTTTTTTTTTTTTTTTAAAAAAAAILVAAAAAAAA # 1. sequence TTTTFTTTATTTTTTDTTTTT # 2. sequence DELETION AAAAAAAAAAAAATTGTTTTTTTXXXXXTTTTTTTTTTMAAAAAAAAAAAAAAAA # 1. sequence TTGTTTTTTTTTTTTTTTTTM # 2. sequence REVERSE TTTTTTTTTTTTTTTTTTTT # 1. sequence AAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTAAAAAAAAAAAAAAAA # 2. sequence PERFECT MATCHING ONLY AT BEGIN AND END OF 2. SEQUENCE AAAAAAAAAAATTTTTTTTGGGGGGGGGGGGGGGGGGGGGTTTTTTTTTAAAAAAA # 1.sequence TTTTTTTTGGGNNGGGEEGGGEGGGGGGTTTTTTTTT # 2. Sequence
I tried with the regexp and the module String::Approx and aslice with the option 'minimal_distance', but I don't like the return values for this module.
Any hints how to do "the best way"?
Edited by Chady -- fixed formatting.