in reply to Challenge: Fast Common Substrings
Just for the sake of completeness: A fast and elegant algorithm for this is a tricky use of suffix trees. One concatenates the two strings of length n and m, say abcdef%efgab$. It is possible to construct a suffix tree of this string in O(n+m) (Ukkonen algorithm). To find the common substrings, one has then to search for nodes that have exactly two (or the number of strings) leafs belonging to the different words.
The resulting suffix tree for "abcdef" and "efgab":
So "ab" has two leafs in the different words (position <= 7 for leaf 1 and position > 7 for leaf 2). So have 'b', 'ef' and 'f'.| |(3:cdef%efgab$)|leaf |(1:ab)| | |(13:$)|leaf tree:| | |(3:cdef%efgab$)|leaf |(2:b)| | |(13:$)|leaf | |(3:cdef%efgab$)|leaf | |(4:def%efgab$)|leaf | | |(7:%efgab$)|leaf |(5:ef)| | |(10:gab$)|leaf | | |(7:%efgab$)|leaf |(6:f)| | |(10:gab$)|leaf | |(7:%efgab$)|leaf | |(10:gab$)|leaf |
http://en.wikipedia.org/wiki/Longest_common_substring_problem
Update: Just found some perl code with google ... on perlmonks ;) Re: finding longest common substring
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Challenge: Fast Common Substrings
by blokhead (Monsignor) on Apr 04, 2007 at 16:02 UTC | |
by tye (Sage) on Apr 04, 2007 at 21:53 UTC | |
by lima1 (Curate) on Apr 04, 2007 at 22:12 UTC | |
by lima1 (Curate) on Apr 04, 2007 at 16:25 UTC |
In Section
Seekers of Perl Wisdom