go ahead... be a heretic PerlMonks

### Re: Analysing a (binary) string. (Solved)

by hexcoder (Chaplain)
 on Jun 28, 2013 at 13:36 UTC ( #1041254=note: print w/replies, xml ) Need Help??

in reply to Analysing a (binary) string. (Solved)

Hi,

just for completeness (since its solved). A suitable data structure could be that of a suffix tree or a suffix array. It allows finding the longest repeating substring(s) in linear time.

I am using it in the detection of cut-and-paste code.

• Comment on Re: Analysing a (binary) string. (Solved)

Replies are listed 'Best First'.
Re^2: Analysing a (binary) string. (Solved)
by BrowserUk (Pope) on Jun 28, 2013 at 14:08 UTC

Two questions:

1. Have you ever built a suffix (or any kind of) tree in Perl with 30 million nodes?
2. What effect do you think the presence of errors in the repeats would have upon the suffix trees ability to detect them?

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
1. Not yet, but a suffix array has lower space requirements than a suffix tree.

2. In Dan Gusfield's book 'Algorithms on Strings, Trees, and Sequences' several inexact matching solutions based on suffix trees are discussed (for example in chapter 12.2). If there are up to k errors without insertions and deletions, it is the k-mismatch problem (see chap. 9.4) with a solution running in O(km), where k is the number of errors and m is the length of the string. Otherwise if the errors shall include insertions/deletions it is called the k-difference problem.

Hope that helps.

Create A New User
Node Status?
node history
Node Type: note [id://1041254]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2017-06-27 02:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
How many monitors do you use while coding?

Results (598 votes). Check out past polls.