Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Analysing a (binary) string. (Solved)

by hexcoder (Hermit)
on Jun 28, 2013 at 13:36 UTC ( #1041254=note: print w/ replies, xml ) Need Help??


in reply to Analysing a (binary) string. (Solved)

Hi,

just for completeness (since its solved). A suitable data structure could be that of a suffix tree or a suffix array. It allows finding the longest repeating substring(s) in linear time.

I found this page instructive: http://www.allisons.org/ll/AlgDS/Tree/Suffix/.

I am using it in the detection of cut-and-paste code.


Comment on Re: Analysing a (binary) string. (Solved)
Re^2: Analysing a (binary) string. (Solved)
by BrowserUk (Pope) on Jun 28, 2013 at 14:08 UTC

    Two questions:

    1. Have you ever built a suffix (or any kind of) tree in Perl with 30 million nodes?
    2. What effect do you think the presence of errors in the repeats would have upon the suffix trees ability to detect them?

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      1. Not yet, but a suffix array has lower space requirements than a suffix tree.

      2. In Dan Gusfield's book 'Algorithms on Strings, Trees, and Sequences' several inexact matching solutions based on suffix trees are discussed (for example in chapter 12.2). If there are up to k errors without insertions and deletions, it is the k-mismatch problem (see chap. 9.4) with a solution running in O(km), where k is the number of errors and m is the length of the string. Otherwise if the errors shall include insertions/deletions it is called the k-difference problem.

      Hope that helps.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1041254]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (10)
As of 2014-07-22 11:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (110 votes), past polls