Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^2: Challenge: Fast Common Substrings

by blokhead (Monsignor)
on Apr 04, 2007 at 16:02 UTC ( [id://608313]=note: print w/replies, xml ) Need Help??


in reply to Re: Challenge: Fast Common Substrings
in thread Challenge: Fast Common Substrings

++ Wow, thank you for introducing me to suffix trees. What an interesting concept, and how refreshing to see a linear-time algorithm for constructing such a creature. I see you've used the javascript applet at this page, which others may want to check out.

However, I'd like to slightly revise the algorithm you outlined. Consider the following example:

string = ababc%bc$ | |(3:abc%bc$)|leaf |(1:ab)| | |(5:c%bc$)|leaf tree:| | |(3:abc%bc$)|leaf |(2:b)| | | |(6:%bc$)|leaf | |(5:c)| | | |(9:$)|leaf | | |(6:%bc$)|leaf |(5:c)| | |(9:$)|leaf | |(6:%bc$)|leaf | |(9:$)|leaf
"ab" appears twice in the first string, and so it gives a node with two leaves. The actual condition you should check is whether a node has one leaf containing the % separator and another leaf without the % symbol.

blokhead

Replies are listed 'Best First'.
Re^3: Challenge: Fast Common Substrings (O(n)?)
by tye (Sage) on Apr 04, 2007 at 21:53 UTC

    The page you link to mentions being able to build them in O(n) but then only really describes how to go from a suffix tree for string $x to one for string $x.$c (1==length$c) in O(length $x). Using that algorithm would require O(N*N) to build the suffix tree for a string of length N.

    So I'm not sure I believe the O(N) claim for building the whole suffix tree based on that page.

    - tye        

      The naive algorithm requires O(N*N). The Ukkonen algorithm needs only O(N). If you want to understand it - it is not trivial - I recommend Gusfields book (Algorithms on Strings,...).
Re^3: Challenge: Fast Common Substrings
by lima1 (Curate) on Apr 04, 2007 at 16:25 UTC
    Or even easier: check the positions of the substrings (<=7 and > 7 in my example).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://608313]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (9)
As of 2024-04-18 15:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found