Problems? Is your data what you think it is? PerlMonks

### Re: Analysing a (binary) string.

by GrandFather (Sage)
 on Jun 28, 2013 at 06:06 UTC ( #1041151=note: print w/replies, xml ) Need Help??

in reply to Analysing a (binary) string. (Solved)

Is there a constraint on the minimum interesting substring length?

Can you treat a "short" part of the large string as representative for purposes of discovering the repeating substring?

Is there a minimum or maximum amount of "padding" between repeats?

May there be non-repeating header or tail sections in the large string?

Maybe a little more of the bigger picture may help understand the constraints on the problem?

True laziness is hard work

Replies are listed 'Best First'.
Re^2: Analysing a (binary) string.
by BrowserUk (Pope) on Jun 28, 2013 at 07:02 UTC
1. Is there a constraint on the minimum interesting substring length?

If the substring was less than a few tens, any repeating pattern would be obvious visually -- and it isn't -- so say 50 or 60 as a minimum.

2. Can you treat a "short" part of the large string as representative for purposes of discovering the repeating substring?

Certainly as a first pass. Once boundaries are established it is easy to go back and check the full length.

3. Is there a minimum or maximum amount of "padding" between repeats?

No padding between repeats. (Hence "contiguous" in the OP.)

4. May there be non-repeating header or tail sections in the large string?

In the form of a partial (end) repeat at the beginning and a partial (start) repeat at the end. With both obviously shorter than the length of the repeat.

5. Maybe a little more of the bigger picture may help understand the constraints on the problem?

Not really. They really are just strings of small numbers that if there are repeats it allows me to do one thing; if not then I need to do something else.

I know there are repeats in the bigger dataset that these strings are a small subset -- I've already found some of them -- but it may be that the subsets I currently have are too short to contain the 2 or more whole repeats that is required for me to recognise them.

The really big picture would take a great deal of explaining and only lead to rambling side discussions that wouldn't help with this particular sub-problem.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Create A New User
Node Status?
node history
Node Type: note [id://1041151]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2017-11-23 03:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
In order to be able to say "I know Perl", you must have:

Results (328 votes). Check out past polls.

Notices?