http://www.perlmonks.org?node_id=371799
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I have at my work a little tricky task. It is a string matching problem. I want to compare two amino acid sequences (here in Letter code, single char represents one amino acid) to find at which positions one sequence lies in the other! example?

MAAGAAAAFAAAATTTTTTTTFTTTTTTTTTTTTAAAAEAAAARAAAAAA # 1. sequence TTTTTTTTFTTTTTTTTTTTT # 2. sequence result is: 2. lies at position 14 to 34 in 1.
simple? new examples
SUBSTITUTION AAAAEAAAARGAAATTTTFTTTTTTTTTTTTTTTTAAAAAAAAILVAAAAAAAA # 1. sequence TTTTFTTTATTTTTTDTTTTT # 2. sequence DELETION AAAAAAAAAAAAATTGTTTTTTTXXXXXTTTTTTTTTTMAAAAAAAAAAAAAAAA # 1. sequence TTGTTTTTTTTTTTTTTTTTM # 2. sequence REVERSE TTTTTTTTTTTTTTTTTTTT # 1. sequence AAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTAAAAAAAAAAAAAAAA # 2. sequence PERFECT MATCHING ONLY AT BEGIN AND END OF 2. SEQUENCE AAAAAAAAAAATTTTTTTTGGGGGGGGGGGGGGGGGGGGGTTTTTTTTTAAAAAAA # 1.sequence TTTTTTTTGGGNNGGGEEGGGEGGGGGGTTTTTTTTT # 2. Sequence
I tried with the regexp and the module String::Approx and aslice with the option 'minimal_distance', but I don't like the return values for this module.

Any hints how to do "the best way"?

Murcia

edit (broquaint): dropped <pre> tags aand added formatting