|Perl: the Markov chain saw|
Re^5: Search for identical substringsby BrowserUk (Pope)
|on Aug 20, 2005 at 00:49 UTC||Need Help??|
Here are my results from the 6 sequences you posted on your scratchpad. I must assume that this is a "constructed dataset" as all the LCSs are found at the same offset in both sequences in which they occur? I thought this was a bug when I first saw it, but it doesn't happen with any of my test data.
There were no duplicate equal length matches. Some of the LCSs shown below are truncated for posting, but the Length and (offsets) and first 80 or so characters should be enough to verify the results. Confirmation or otherwise would be nice to have.
If this data is representative, the time taken for the 15 pairing projects to a total runtime for your 300x3k of around 58 hours. Somewhat more palatable that 3 years:)
Had you only wanted the single longest common string in the dataset, I can do that in under 6 hours.
Updated: The offsets originally shown were all +10 due to my failing to remove the sequence labels. This has now been corrected.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.