Re: Longest Common SubSequence Not Working Correctly

Thanks. I forgot to include the

memoize('lcs');

that's why my code took so long to run. I also found this brute force method in Perl Monk and ran a comparison on both method and found that the brute force method actually ran faster. This does not make sense at all.
LCSS method:

# lcs.pl
use strict;
use Memoize;

sub longerOf {
    my ($x, $y) = @_;
    return (length $x > length $y) ? $x : $y;
}

memoize('lcs');
sub lcs {
    my ($a, $b) = @_;

    if ($a eq "" || $b eq ""){
        return "";
    }

    my ($az, $bz) = (chop $a, chop $b);
    if ($az eq $bz){
        return lcs($a, $b) . $az;
    } else {
        return longerOf(
                  lcs($a . $az, $b),
                  lcs($b . $bz, $a));
    }
}

while (1){
    print "1: "; my $a = <>; chomp $a;
    print "2: "; my $b = <>; chomp $b;
$start = time();
    print "LCS: ", lcs($a, $b), "\n\n";
  $end = time();
   print "<br>Time taken was ", ($end - $start), " seconds";

$start = time();
    print "Brute Force: ", lcsbruteforce($a, $b), "\n\n";
  $end = time();
   print "<br>Time taken was ", ($end - $start), " seconds";
}


sub lcsbruteforce {
  my($x, $y) = @_;
  my(@v, $cx, $cy, $left, $above);
  for my $xi (0 .. length($x) - 1) {
    $cx = substr $x, $xi, 1;
    for my $yi (0 .. length($y) - 1) {
      $cy = substr $y, $yi, 1;
      if ($cx eq $cy) {
        $v[$xi][$yi] = 1 + (($xi && $yi) ? $v[$xi - 1][$yi - 1] : 0);
      } else {
        $left = ($xi && $v[$xi - 1][$yi]) || 0;
        $above = ($xi && $v[$xi][$yi - 1]) || 0;
        $v[$xi][$yi] = ($left > $above) ? $left : $above;
      }
    }
  }
  return $v[length($x) - 1][length($y) - 1];
}
[download]

Comment on Re: Longest Common SubSequence Not Working Correctly Download Code

Replies are listed 'Best First'.
Re^2: Longest Common SubSequence Not Working Correctly by blokhead (Monsignor) on Nov 13, 2007 at 20:01 UTC
Don't let the name of the subroutine fool you. The "brute force" algorithm is not really "brute forcing" the problem. A brute force approach would be to consider every possible subsequence of the strings, taking O(2^min(x,y)) time. In fact, the "brute force" algorithm is doing the same thing as the recursive algorithm (i.e., doing the dynamic programming solution), but iteratively. It uses a standard trick for making a recursive memoizing dynamic programming algorithm iterative. Since the two algorithms solve the problem in essentially the same way, but the iterative one doesn't have the overhead of subroutine calls (which are slow in Perl), it is no surprise that the iterative one is faster. Usually it's easier and more intuitive to write a dynamic programming problem in terms of recursive calls. However, it's necessary to memoize the result of each recursive call, because several other subproblems might use that result of this subproblem in their computation. Now imagine a table that holds all of these memoized results. What happens to this table while the recursive algorithm is running? The table is gradually filling up. How does it fill up? Well, in this case, to compute the value of the subproblem ($a,$b), I need to get the solutiosn for at most these three subproblems: ($a,substr($b,0,-1)), (substr($a,0,-1),b), (substr($a,0,-1),substr($b,0,-1)) In other words, I need to have those 3 cells in the table filled in before I can fill in this cell. So suppose I now do things iteratively instead of recursively, and just concentrate on filling up the table. I'll visit the table's cells in such a way so that I visit the cell ($a,$b) after I visit the three above cells. That way, to fill up the cell ($a,$b), I just check those 3 other cells, do some local comparisons, and I'm done. Finally, the last cell in the table is generally the answer to the "main" subproblem, and I can return that. That's exactly what this "brute force" algorithm is doing. blokhead	[reply]
Re^3: Longest Common SubSequence Not Working Correctly by Anonymous Monk on Nov 14, 2007 at 05:16 UTC
Thanks for the explanation. How would I modify the above brute force code to be truely brute force? Also, the code only print out the length of the sequence but does not print out the characters of the sequence. I tried putting in some print statement in between but it does not seem to work correctly. Any helps?	[reply]
Re^4: Longest Common SubSequence Not Working Correctly by blokhead (Monsignor) on Nov 14, 2007 at 14:58 UTC
The `lcsbruteforce` algorithm maintains this big table of solutions to subproblems. In this example, it's maintaining just the length of the LCS. Just change it to maintain the actual substring itself: sub lcsbruteforce { my($x, $y) = @_; my(@v, $cx, $cy, $left, $above); for my $xi (0 .. length($x) - 1) { $cx = substr $x, $xi, 1; for my $yi (0 .. length($y) - 1) { $cy = substr $y, $yi, 1; if ($cx eq $cy) { # $v[$xi][$yi] = 1 + (($xi && $yi) ? $v[$xi - 1][$yi - 1] : 0); $v[$xi][$yi] = ($xi && $yi ? $v[$xi-1][$yi-1] : "") . $cx; } else { # $left = ($xi && $v[$xi - 1][$yi]) \|\| 0; # $above = ($xi && $v[$xi][$yi - 1]) \|\| 0; # $v[$xi][$yi] = ($left > $above) ? $left : $above; $left = ($xi && $v[$xi - 1][$yi]) \|\| ""; $above = ($xi && $v[$xi][$yi - 1]) \|\| ""; $v[$xi][$yi] = length($left) > length($above) ? $left : $above +; } } } return $v[length($x) - 1][length($y) - 1]; } [download] To change it to an actual brute force algorithm? That would be pretty strange. The brute force algorithm is: `$best = ""; for every subsequence $s of $x: if $s is also a subsequence of $y: $best = $s if length($s) > length($best); return $best;` [download] Of course, the part where you get all subsequences and check for subsequence-ness is a pain. You can probably generate all subsequences using Algorithm::Loops, and perhaps use some regex stuff to check whether a string was a subsequence of another. blokhead	[reply] [d/l] [select]
Re^3: Longest Common SubSequence Not Working Correctly by Anonymous Monk on Nov 15, 2007 at 00:49 UTC
Thank you Blokhead. I still have lots to learn about Perl. I have another question for you. The sub lcs takes 2 parameters ($a, $b) but inside the loop where the recursive call is made, the program did a call with return lcs($a, $b) . $az; what does the . $az do? I tried printing out the $a, $b but did not see any differences. The new string is one character shorter. When I remove $az from the return, the program produce the wrong result. Thanks again for your help.	[reply]
Re^2: Longest Common SubSequence Not Working Correctly by moritz (Cardinal) on Nov 13, 2007 at 07:54 UTC
I haven't look too close at your code, but it seems the "brute force" approach takes `O(length($x) * length($y))` time. The recursive method is a bit harder to estimate (at least for me), I'll try it anyway. With Memoize it will run in quadratic time as well in the worst case, since `lcs` will be called with all possible pairs of position shifts in the arguments. Memoize will only reduce this runtime if at least one of the strings is of the from `$s x $n` with `$n >= 2`. But it uses many more method calls, which tend to be slow in Perl. To test if this explanation really works you could benchmark both subs with increasingly long strings, and test if their runtime actually evolves similarly.	[reply] [d/l] [select]


Your skill will accomplish what the force of many cannot
	PerlMonks