Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: Re: finding longest common substring (ALL common substrings)

by revdiablo (Prior)
on Nov 20, 2003 at 17:55 UTC ( #308630=note: print w/replies, xml ) Need Help??

in reply to Re: finding longest common substring (ALL common substrings)
in thread finding longest common substring

This is indeed interesting. With my test data, it's actually a bit faster than my original version (though we've seen how much different data will affect the various algorithms). Pretty impressive, considering what it does. There appears to be a problem, however. It returns undef if you feed it qw(foo bor boz bzo), but works fine with qw(foo boor booz bzoo) and qw(fo bor boz bzo). So if there are any mismatching number of o's, it returns undef. I don't see why offhand; maybe you have some ideas?

Replies are listed 'Best First'.
Re: Re: Re: finding longest common substring (ALL common substrings)
by BrowserUk (Pope) on Nov 20, 2003 at 22:29 UTC

    Sorry. The code is flawed. It does produce all the common substrings, but it will often select the wrong "longest".

    The problem occurs because if a substring occurs twice in one of the input strings, and not at all in one of the others, it's count will be the same as if it had appeared once in both, The selection mechanism, the longest key who's count is equal to the number of input strings is bogus, but suffuciently convincing that it worked for all 5 sets of test data I tried it on!

    I'm trying to think of an efficient way of counting how many of the original strings each substring is found in, but the only one I've come up with so far would limit the number of input strings to 32. A couple of other ideas I tried worked, but carry enough overhead to make the method less interesting.

    I'll keep looking at it, but maybe my "surprise at the simplicity and efficiency" was the red flag that should have told me that I was missing something! Still, nothing ventured, nothing gained.

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://308630]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2020-12-03 08:35 GMT
Find Nodes?
    Voting Booth?
    How often do you use taint mode?

    Results (53 votes). Check out past polls.