As described, I don't think there is a unique solution. Do you just want any valid solution, or one with some sort of minimum cost, or what?
in reply to diff of two strings
Consider (I'll use single letters to stand in for words): ABA vs BAA: <A>BA,BA<A>? Or [A][B]A,[B][A]A?
I'm guessing you don't care too much, as long as there isn't an easy way to remove markup from the result (i.e., it's ok to return a local minimum even if it isn't the global minimum).
For your algorithm above, I think what you really want is a suffix tree (of words) instead of the dynamic programming algorithm, but beyond that, my brain is too sleepy to be of much use. You'd find the LCSS, remove it from the suffix tree in some sense, and then I think you'd be able to immediately iterate to find the remaining LCSSes until you run out. I think the removal and all other operations should be pretty quick.
Without a suffix tree... how slow would it be to run your existing LCSS algorithm, then replace the found LCSS with a different dummy token in each string and repeat? Or is that what you're already doing?