Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: similar string matching

by BrowserUk (Pope)
on Jul 05, 2004 at 13:41 UTC ( #371877=note: print w/replies, xml ) Need Help??


in reply to similar string matching

Needs lots more testcases and no promises about speed, but this seems to work for the cases you outlined.

Results: (underscores show the offset; lowercase left partial match; uppercase right partial match.

P:\test>371799 Searching: MAAGAAAAFAAAATTTTTTTTFTTTTTTTTTTTTAAAAEAAAARAAAAAA for : TTTTTTTTFTTTTTTTTTTTT MAAGAAAAFAAAATTTTTTTTFTTTTTTTTTTTTAAAAEAAAARAAAAAA _____________TTTTTTTTFTTTTTTTTTTTT Searching: AAAAEAAAARGAAATTTTFTTTTTTTTTTTTTTTTAAAAAAAAILVAAAAAAAA for : TTTTFTTTATTTTTTDTTTTT AAAAEAAAARGAAATTTTFTTTTTTTTTTTTTTTTAAAAAAAAILVAAAAAAAA ______________ttttfttttttttttttttTT Searching: AAAAAAAAAAAAATTGTTTTTTTXXXXXTTTTTTTTTTMAAAAAAAAAAAAAAAA for : TTGTTTTTTTTTTTTTTTTTM AAAAAAAAAAAAATTGTTTTTTTXXXXXTTTTTTTTTTMAAAAAAAAAAAAAAAA _____________ttgttttttt-----TTTTTTTTTTM Searching: TTTTTTTTTTTTTTTTTTTT for : AAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTAAAAAAAAAAAAAAAA AAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTAAAAAAAAAAAAAAAA ____________TTTTTTTTTTTTTTTTTTTT Searching: AAAAAAAAAAATTTTTTTTGGGGGGGGGGGGGGGGGGGGGTTTTTTTTTAAAAAAA for : TTTTTTTTGGGNNGGGEEGGGEGGGGGGTTTTTTTTT AAAAAAAAAAATTTTTTTTGGGGGGGGGGGGGGGGGGGGGTTTTTTTTTAAAAAAA ___________ttttttttgggggggggggggggGGGGGGTTTTTTTTT

Code

#! perl -slw use strict; use List::Util qw[ reduce ]; # PAIR: ##Removed after testing while( my $haystack = <DATA> ) { $haystack =~ s[^\s*(\S+?)\b.*$][$1]s; ## Strips comments ( my $needle = <DATA> ) =~ s[^\s*(\S+?)\b.*$][$1]s; print "\nSearching: $haystack\nfor\t: $needle\n"; ( $haystack, $needle ) = ( $needle, $haystack ) if length( $haystack ) < length( $needle ); ## Reversed ## Try for a straight match while( $haystack =~ m[$needle]g ) { printf "$haystack\n%s%-*s\n\n" , '_' x (pos( $haystack ) - length( $needle )) , length( $haystack ) , $needle; ## next PAIR; ## Removed after testing } ## No joy, so try for a left-right partial. my $re_left = reduce{ "$a$b?" } split '', $needle; my $re_right = reduce{ "$a?$b" } split '', $needle; my $re = qr[(($re_left)(.*?)($re_right))]; while( $haystack =~ m[$re]g ) { printf "$haystack\n%s%-*s%s%*s\n\n" , '_' x ( pos( $haystack ) - length( $1 ) ) , length( $2 ) , lc($2) , '-' x length( $3 ) , length( $4 ) , $4; # next PAIR; ## Removed after testing } } __DATA__ MAAGAAAAFAAAATTTTTTTTFTTTTTTTTTTTTAAAAEAAAARAAAAAA # 1. sequen +ce TTTTTTTTFTTTTTTTTTTTT # 2. sequen +ce AAAAEAAAARGAAATTTTFTTTTTTTTTTTTTTTTAAAAAAAAILVAAAAAAAA # 1. sequen +ce TTTTFTTTATTTTTTDTTTTT # 2. sequen +ce AAAAAAAAAAAAATTGTTTTTTTXXXXXTTTTTTTTTTMAAAAAAAAAAAAAAAA # 1. sequen +ce TTGTTTTTTTTTTTTTTTTTM # 2. sequen +ce TTTTTTTTTTTTTTTTTTTT # 1. sequen +ce AAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTAAAAAAAAAAAAAAAA # 2. sequen +ce AAAAAAAAAAATTTTTTTTGGGGGGGGGGGGGGGGGGGGGTTTTTTTTTAAAAAAA # 1.sequenc +e TTTTTTTTGGGNNGGGEEGGGEGGGGGGTTTTTTTTT # 2. Sequen +ce

Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://371877]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (2)
As of 2018-08-19 00:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Asked to put a square peg in a round hole, I would:









    Results (186 votes). Check out past polls.

    Notices?