Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Re: diff of two strings

by sfink (Deacon)
on Jan 09, 2008 at 08:29 UTC ( #661285=note: print w/replies, xml ) Need Help??

in reply to diff of two strings

As described, I don't think there is a unique solution. Do you just want any valid solution, or one with some sort of minimum cost, or what?

Consider (I'll use single letters to stand in for words): ABA vs BAA: <A>BA,BA<A>? Or [A][B]A,[B][A]A?

I'm guessing you don't care too much, as long as there isn't an easy way to remove markup from the result (i.e., it's ok to return a local minimum even if it isn't the global minimum).

For your algorithm above, I think what you really want is a suffix tree (of words) instead of the dynamic programming algorithm, but beyond that, my brain is too sleepy to be of much use. You'd find the LCSS, remove it from the suffix tree in some sense, and then I think you'd be able to immediately iterate to find the remaining LCSSes until you run out. I think the removal and all other operations should be pretty quick.

Without a suffix tree... how slow would it be to run your existing LCSS algorithm, then replace the found LCSS with a different dummy token in each string and repeat? Or is that what you're already doing?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://661285]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (12)
As of 2016-10-24 18:29 GMT
Find Nodes?
    Voting Booth?
    How many different varieties (color, size, etc) of socks do you have in your sock drawer?

    Results (309 votes). Check out past polls.