Do you know where your variables are? PerlMonks

### Re: Can KMP be used to find the longest common subsequence?

by Thilosophy (Curate)
 on Dec 12, 2007 at 07:34 UTC ( #656561=note: print w/replies, xml ) Need Help??

Some background context:

Wikipedia has an explanation of the Knuth-Morris-Pratt algorithm for substring searches.

The difference between a substring and a subsequence is that a subsequence is not necessarily contiguous (all the characters must be present, and in the correct order, but there can be unrelated characters in between).

• Comment on Re: Can KMP be used to find the longest common subsequence?

Replies are listed 'Best First'.
Re^2: Can KMP be used to find the longest common subsequence?
by bart (Canon) on Dec 12, 2007 at 09:44 UTC
The difference between a substring and a subsequence is that a subsequence is not necessarily contiguous (all the characters must be present, and in the correct order, but there can be unrelated characters in between).
In that case, I think the easiest way to search for subsequences in Perl is to use regular expressions. For example, if the subsequence is "abcd", then the regular expression to look for a subsequence is /a.*?b.*?c.*?d/.

And since you can easily contruct a regex out of a string, that is easy toconstruct:

```\$string = 'abcd';
\$regex = join '.*?', split //, \$string;
\$sequence = 'xxxxxaxxxxxxbxxxxxcxxxxdxxxxx';
if(\$sequence =~ /\$regex/s) {
printf "I found a match, at offset %d and with total length %d\n",
+ \$-[0], \$+[0]-\$-[0];
}
Result:
I found a match, at offset 5 and with total length 19

If that were to fail to match, then it would fail to match very badly, as in O(N**K) badly. Better:

```/^[^a]*?a[^b]*?b[^c]*?c[^d]*?d/

or use features that I don't use and so don't remember to prevent backtracking. The above version should proceed to match the string linearly and in O(n) and, if that fails, backtrack linearly in O(n) and quickly fail.

- tye

Good point.
or use features that I don't use and so don't remember to prevent backtracking.
I do remember, you must be talking about the "cut" assertion: (?>PATTERN). See japhy's drafts for his (unfortunately) abandoned book on regexes, at http://japhy.perlmonk.org/book/book/ , chapter 8 (MS Word format), section 8.1.4.

Only, it won't work here, because If I did /a(?>.*?)b/, it would never match, because once the .*? ate a "b", it would never give it back.

```print 'xxxxaxxxxbxxxx' =~ /a(?>.*?)b/ ? 'match' : 'fail';
produces fail.

So yours is the better solution... you don't need the "?", because /^[^a]*?a/ and /^[^a]*a/ will match the exact same substring. I have no idea which is faster, or whether it even makes a difference, so I'll just leave it in.

```\$string = 'abcd';
\$regex = '^' . join '', map "[^\$_]*\$_", map quotemeta, split //, \$stri
+ng;
# embed capture to get offset:
\$regex =~ s/(?<=\*)/(/; \$regex =~ s/\$/)/;
\$sequence = 'xxxxxaxxxxxxbxxxxxcxxxxdxxxxx';
if(\$sequence =~ /\$regex/s) {
printf "I found a match, at offset %d and with total length %d\n",
+ \$-[1], \$+[1]-\$-[1];
}
As a precaution I've included a quotemeta so you can safely search for subsequences containing \W characters.

Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://656561]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (3)
As of 2022-05-28 08:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
Do you prefer to work remotely?

Results (98 votes). Check out past polls.

Notices?