Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Re^2: Analysing a (binary) string.

by BrowserUk (Pope)
on Jun 28, 2013 at 07:02 UTC ( #1041157=note: print w/replies, xml ) Need Help??

in reply to Re: Analysing a (binary) string.
in thread Analysing a (binary) string. (Solved)

  1. Is there a constraint on the minimum interesting substring length?

    If the substring was less than a few tens, any repeating pattern would be obvious visually -- and it isn't -- so say 50 or 60 as a minimum.

  2. Can you treat a "short" part of the large string as representative for purposes of discovering the repeating substring?

    Certainly as a first pass. Once boundaries are established it is easy to go back and check the full length.

  3. Is there a minimum or maximum amount of "padding" between repeats?

    No padding between repeats. (Hence "contiguous" in the OP.)

  4. May there be non-repeating header or tail sections in the large string?

    In the form of a partial (end) repeat at the beginning and a partial (start) repeat at the end. With both obviously shorter than the length of the repeat.

  5. Maybe a little more of the bigger picture may help understand the constraints on the problem?

    Not really. They really are just strings of small numbers that if there are repeats it allows me to do one thing; if not then I need to do something else.

    I know there are repeats in the bigger dataset that these strings are a small subset -- I've already found some of them -- but it may be that the subsets I currently have are too short to contain the 2 or more whole repeats that is required for me to recognise them.

    The really big picture would take a great deal of explaining and only lead to rambling side discussions that wouldn't help with this particular sub-problem.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1041157]
[Corion]: I think I'm overdesigning things again. I want to export(later, synchronize) data from Google Keep, by scraping the HTML. And I'm thinking of automating this by having a canary note whose text my program knows and from which it can determine the ...
[Corion]: ... surrounding HTML to scrape all the other notes. Maybe I should better look at dumping all the requests that pass between Google and my "browser" instead.
[choroba]: The older one will even perform twice, once at a retirement home, and then at the music school. It's a day off, but will be pretty busy...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2017-12-12 08:55 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (327 votes). Check out past polls.