Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re^2: Finding repeat sequences.

by sundialsvc4 (Abbot)
on Jun 19, 2013 at 04:22 UTC ( #1039719=note: print w/replies, xml ) Need Help??

in reply to Re: Finding repeat sequences.
in thread Finding repeat sequences.

It might even be easier than this ... (pseudocode this time)

for $t from 1 to int(length(string) / 2): next unless the first $t chars equal the last $t; # i.e. "$t" must be a plausible tail-size. $r = length - $t; $incr = int($r / 2); while( $incr > 0) { next unless ($incr % $r == 0); # i.e. must be a possible repeated-block size in this space solution is found if string(0, $incr) eq string($incr, $incr); # i.e. if we find one repetition then it must be the answer. $incr--; } }
As before, start by looking for a tail. If you think you found one, look for repetitions in possibly-sized increments of the remaining string. This approach will consider all successively-larger candidates for the "tail," up to and including the largest tail, which is "half of it." For each tail-size, it looks for one repetition of all possible sizes which would fit evenly within this area, knowing that the left-hand portion is defined to consist only of repetitions. Continue for all tail-sizes, and for each of these, all rep-sizes, until the first success is found. It works only for data which does, in fact, match this assertion.

Replies are listed 'Best First'.
Re^3: Finding repeat sequences.
by Eily (Vicar) on Jun 19, 2013 at 12:40 UTC

    Oh! I didn't think of that one. But I'm not sure why you have to do all those extra verifications. Since the input data is a repeated pattern, the length of that pattern is how little you can shift your string to the left and have a perfect match.

    $s = 'abcdabcdabceabcdabcdabceab'; for (1..length($s)-1) { print substr($s,0,$_) and last if substr($s,$_) eq substr($s,0 +,length($s)-$_); }
Re^3: Finding repeat sequences.
by sundialsvc4 (Abbot) on Jun 19, 2013 at 13:05 UTC

    I think that the pseudo-coded solution is the strongest one, if it is made to stop after finding the longest possible substring (then last), and if it is allowed to find all possible (reasonably-long) tails, once again starting with the longest one.   (These would be what a human being would probably consider to be the best “right answers.”)

    With very-short tails, as written it might produce wrong answers if it considers only the first two occurrences (consider string 'aaba' if the tail were merely 'a' .. incorrect).   But the essential idea, I think, is still valid.

    One reason why I wrote it this way was in an effort to avoid “churning” memory when dealing with exceptionally-long strings.   You don’t need to consider any string that isn’t a tail, nor, within the head, any repeated-string candidate that won’t fit.   That subdivides the problem into two searches, both of which have only a few possibilities each.   It might not yet be bug-free, but it ought to be close . . .

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1039719]
[erix]: heh, understatement of the week :)
[erix]: well, I hope you're right
[Your Mother]: Bill Clinton assaulted and raped repeatedly. Even when it was hinted and finally all but proved, no one cared because his sexism, and racism, and killing was decorous.
[Your Mother]: I think Trump might be better specifically because no one respects him so he won't get his way the way the others did.
[Your Mother]: (I certainly did not vote for him and have been an outspoken critic of his for three decades, by the by.)
[erix]: I'd like Russia threatened a bit more, flattered a bit less
[Your Mother]: The last two regimes killed a million, displaced 10 million, torture prisons, and ruined countries… somehow it's magically bad _now_.
[erix]: what/who are the last two regimes?
[Your Mother]: Bush/Obama. 15 straight years of warfare, torture, etc, etc.
[Your Mother]: All caused by the political fallout of Carter/Reagan when the Afghan chickens came home to roost on 9/11.

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2017-01-20 01:03 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (173 votes). Check out past polls.