Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Longest Common SubSequence Not Working Correctly

by Anonymous Monk
on Nov 13, 2007 at 01:32 UTC ( [id://650419]=note: print w/replies, xml ) Need Help??


in reply to Longest Common SubSequence Not Working Correctly

Thanks. I forgot to include the

memoize('lcs');

that's why my code took so long to run. I also found this brute force method in Perl Monk and ran a comparison on both method and found that the brute force method actually ran faster. This does not make sense at all.
LCSS method:
# lcs.pl use strict; use Memoize; sub longerOf { my ($x, $y) = @_; return (length $x > length $y) ? $x : $y; } memoize('lcs'); sub lcs { my ($a, $b) = @_; if ($a eq "" || $b eq ""){ return ""; } my ($az, $bz) = (chop $a, chop $b); if ($az eq $bz){ return lcs($a, $b) . $az; } else { return longerOf( lcs($a . $az, $b), lcs($b . $bz, $a)); } } while (1){ print "1: "; my $a = <>; chomp $a; print "2: "; my $b = <>; chomp $b; $start = time(); print "LCS: ", lcs($a, $b), "\n\n"; $end = time(); print "<br>Time taken was ", ($end - $start), " seconds"; $start = time(); print "Brute Force: ", lcsbruteforce($a, $b), "\n\n"; $end = time(); print "<br>Time taken was ", ($end - $start), " seconds"; } sub lcsbruteforce { my($x, $y) = @_; my(@v, $cx, $cy, $left, $above); for my $xi (0 .. length($x) - 1) { $cx = substr $x, $xi, 1; for my $yi (0 .. length($y) - 1) { $cy = substr $y, $yi, 1; if ($cx eq $cy) { $v[$xi][$yi] = 1 + (($xi && $yi) ? $v[$xi - 1][$yi - 1] : 0); } else { $left = ($xi && $v[$xi - 1][$yi]) || 0; $above = ($xi && $v[$xi][$yi - 1]) || 0; $v[$xi][$yi] = ($left > $above) ? $left : $above; } } } return $v[length($x) - 1][length($y) - 1]; }

Replies are listed 'Best First'.
Re^2: Longest Common SubSequence Not Working Correctly
by blokhead (Monsignor) on Nov 13, 2007 at 20:01 UTC
    Don't let the name of the subroutine fool you. The "brute force" algorithm is not really "brute forcing" the problem. A brute force approach would be to consider every possible subsequence of the strings, taking O(2min(x,y)) time.

    In fact, the "brute force" algorithm is doing the same thing as the recursive algorithm (i.e., doing the dynamic programming solution), but iteratively. It uses a standard trick for making a recursive memoizing dynamic programming algorithm iterative. Since the two algorithms solve the problem in essentially the same way, but the iterative one doesn't have the overhead of subroutine calls (which are slow in Perl), it is no surprise that the iterative one is faster.

    Usually it's easier and more intuitive to write a dynamic programming problem in terms of recursive calls. However, it's necessary to memoize the result of each recursive call, because several other subproblems might use that result of this subproblem in their computation.

    Now imagine a table that holds all of these memoized results. What happens to this table while the recursive algorithm is running? The table is gradually filling up. How does it fill up? Well, in this case, to compute the value of the subproblem ($a,$b), I need to get the solutiosn for at most these three subproblems:

    ($a,substr($b,0,-1)), (substr($a,0,-1),b), (substr($a,0,-1),substr($b,0,-1))
    In other words, I need to have those 3 cells in the table filled in before I can fill in this cell.

    So suppose I now do things iteratively instead of recursively, and just concentrate on filling up the table. I'll visit the table's cells in such a way so that I visit the cell ($a,$b) after I visit the three above cells. That way, to fill up the cell ($a,$b), I just check those 3 other cells, do some local comparisons, and I'm done. Finally, the last cell in the table is generally the answer to the "main" subproblem, and I can return that. That's exactly what this "brute force" algorithm is doing.

    blokhead

      Thanks for the explanation. How would I modify the above brute force code to be truely brute force? Also, the code only print out the length of the sequence but does not print out the characters of the sequence. I tried putting in some print statement in between but it does not seem to work correctly. Any helps?
        The lcsbruteforce algorithm maintains this big table of solutions to subproblems. In this example, it's maintaining just the length of the LCS. Just change it to maintain the actual substring itself:
        sub lcsbruteforce { my($x, $y) = @_; my(@v, $cx, $cy, $left, $above); for my $xi (0 .. length($x) - 1) { $cx = substr $x, $xi, 1; for my $yi (0 .. length($y) - 1) { $cy = substr $y, $yi, 1; if ($cx eq $cy) { # $v[$xi][$yi] = 1 + (($xi && $yi) ? $v[$xi - 1][$yi - 1] : 0); $v[$xi][$yi] = ($xi && $yi ? $v[$xi-1][$yi-1] : "") . $cx; } else { # $left = ($xi && $v[$xi - 1][$yi]) || 0; # $above = ($xi && $v[$xi][$yi - 1]) || 0; # $v[$xi][$yi] = ($left > $above) ? $left : $above; $left = ($xi && $v[$xi - 1][$yi]) || ""; $above = ($xi && $v[$xi][$yi - 1]) || ""; $v[$xi][$yi] = length($left) > length($above) ? $left : $above +; } } } return $v[length($x) - 1][length($y) - 1]; }
        To change it to an actual brute force algorithm? That would be pretty strange. The brute force algorithm is:
        $best = ""; for every subsequence $s of $x: if $s is also a subsequence of $y: $best = $s if length($s) > length($best); return $best;
        Of course, the part where you get all subsequences and check for subsequence-ness is a pain. You can probably generate all subsequences using Algorithm::Loops, and perhaps use some regex stuff to check whether a string was a subsequence of another.

        blokhead

      Thank you Blokhead. I still have lots to learn about Perl. I have another question for you. The sub lcs takes 2 parameters ($a, $b) but inside the loop where the recursive call is made, the program did a call with

      return lcs($a, $b) . $az;

      what does the . $az do? I tried printing out the $a, $b but did not see any differences. The new string is one character shorter. When I remove $az from the return, the program produce the wrong result. Thanks again for your help.
Re^2: Longest Common SubSequence Not Working Correctly
by moritz (Cardinal) on Nov 13, 2007 at 07:54 UTC
    I haven't look too close at your code, but it seems the "brute force" approach takes O(length($x) * length($y)) time.

    The recursive method is a bit harder to estimate (at least for me), I'll try it anyway. With Memoize it will run in quadratic time as well in the worst case, since lcs will be called with all possible pairs of position shifts in the arguments. Memoize will only reduce this runtime if at least one of the strings is of the from $s x $n with $n >= 2.

    But it uses many more method calls, which tend to be slow in Perl.

    To test if this explanation really works you could benchmark both subs with increasingly long strings, and test if their runtime actually evolves similarly.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://650419]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2024-04-19 21:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found