Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Algorithm::Loops::NestedLoops and LCS sequence

by hv (Prior)
on Jun 13, 2004 at 14:36 UTC ( [id://363846]=perlquestion: print w/replies, xml ) Need Help??

hv has asked for the wisdom of the Perl Monks concerning the following question:

Following a question to the SeqFan mailing list, I'm trying to do some initial investigation of this problem:

Given n, across all the possible pairs of strings of length n, what is a(n), the maximum number of common subsequences of the same length as the longest common subsequence ('LCS')?

As an example, "aaab" and "aabb" have an LCS of length 3; there is only one distinct common subsequence, "aab", but that string can be extracted as a subsequence of "aaab" in 3 different ways, and of "aabb" in 2 different ways, giving 6 combinations: so a(4) >= 6.

Here's a routine to calculate the longest common subsequence, based on the algorithm described at the bottom of this page:

sub lcs { my($x, $y) = @_; my(@v, $cx, $cy, $left, $above); for my $xi (0 .. length($x) - 1) { $cx = substr $x, $xi, 1; for my $yi (0 .. length($y) - 1) { $cy = substr $y, $yi, 1; if ($cx eq $cy) { $v[$xi][$yi] = 1 + (($xi && $yi) ? $v[$xi - 1][$yi - 1] : 0); } else { $left = ($xi && $v[$xi - 1][$yi]) || 0; $above = ($xi && $v[$xi][$yi - 1]) || 0; $v[$xi][$yi] = ($left > $above) ? $left : $above; } } } return $v[length($x) - 1][length($y) - 1]; }

Now, counting the number of common subsequences of that length is tricky. The approach I've taken is to use Algorithm::Loops::NestedLoops to find all the distinct subsequences of the right length in the first string, and then count the number of ways that subsequence can be extracted from each of the two strings, multiply the results, and add to the sum. Here's "count the number of ways of matching a subsequence in a string":

sub matchss { my($ss, $str) = @_; my @state = (1, (0) x length($ss)); my %index; unshift @{ $index{substr $ss, $_ - 1, 1} }, $_ for 1 .. length($ss); for (split //, $str) { $state[$_] += $state[$_ - 1] for @{ $index{$_} || [] }; } pop @state; }

This works by walking through the target string, counting the number of ways it could be matching up to any particular point in the subsequence as it goes.

Finally, the main subroutine which finds the distinct subsequences of the right length, and counts up the results:

sub lcscount { my($x, $y) = @_; my $n = lcs($x, $y) or return 1; my %seen; my $count = 0; my @x = split //, $x; NestedLoops( [ [ 0 .. $#x ], ( sub { [ $_ + 1 .. $#x ] } ) x ($n - 1), ], sub { my $ssx = join '', @x[@_]; return if $seen{$ssx}++; $count += (matchss($ssx, $y) or return) * matchss($ssx, $x); }, ); $count; }

I originally had the $seen{$ssx}++ check in a separate OnlyWhen block, expecting that the second time it saw (say) "aa", the check there would stop A::L from generating all the "aaxyz..." subsequences, all of which would already have been catered for; rereading the docs however suggested that it wouldn't do this (is there any way to do this?) so I took it out.

As you may have guessed by now, this is my first attempt to use A::L and I don't by any means feel that I fully understand it, so any suggestions for improvements would be gratefully received.

Given that it is not aborting on repeated partial subsequences, the NestedLoops call is actually counting the same as matchss($ssx, $x), so the $count calculation could actually be moved to a later:

$count += $seen{$_} * matchss($_, $y) for keys %seen;
.. but I'm holding out for a way to abort them, in which case I think the current approach would end up faster.

Hugo

Replies are listed 'Best First'.
Re: Algorithm::Loops::NestedLoops and LCS sequence (prune)
by tye (Sage) on Jun 14, 2004 at 07:11 UTC

    Yes, the OnlyWhen routine does not "prune" deeper looping, it only controls whether your other code is called right then or not (which makes it seem rather useless when you look at it in some ways :).

    As I was putting the module together, I tried a few different strategies for getting OnlyWhen to prune (by returning more than just true or false) and with an extra hook or two that could also be used to that purpose. I eventually backed off because it became clear that none of the schemes I had come up with were particularly great. So I hope to attack that challenge again and come up with a good scheme for adding more hooks to make it easy to prune (and to do other things).

    In the mean time, you can use the technique, that I used in Re: Better algorithm than brute-force stack for combinatorial problems? (A::L), of having the loop specification code return [] if no deeper looping is desired.

    - tye        

      Ah, got it. I've modified the call to:

      NestedLoops( [ ( sub { my $prefix = join '', @x[@_]; [ grep !$seen{$prefix . $x[$_]}++, ($_ ? $_ + 1 : @_) .. $#x ] +; } ) x $n, ], sub { my $ssx = join '', @x[@_]; $count += (matchss($ssx, $y) or return) * matchss($ssx, $x); }, );
      .. and that's working fine; a dump of %seen confirms that it is seeing only genuine repeats.

      Of course, now that I've unified the loop specifier, it feels slightly wasteful having to create $n copies of the same subref, but I can live with that. :)

      Thanks for the help.

      Hugo

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://363846]
Approved by TStanley
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2024-03-19 02:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found