Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re^2: Finding repeat sequences.

by BrowserUk (Pope)
on Jun 18, 2013 at 23:01 UTC ( #1039689=note: print w/replies, xml ) Need Help??

in reply to Re: Finding repeat sequences.
in thread Finding repeat sequences.

That's interesting...(not faint praise.). (I'm already thinking of optimisations; like build the longest repeat sequence and then use substr to get shorter versions.)

For example, given the input abcabca, it could be ... So the only interesting answer to return is the shortest possible one.

Hm. Damn you for making me think (again) at this time of night :)

There will always be at least one complete substring.

If there is more than 1 but less than 2, ie. 1 rep + 1 partial; (I believe) it will always be possible to determine the longest < length string match; because the residual always matches the length( residual ) first characters of the string.

So, if the string is 'abcabca'; the rep could be 'abcabc' or 'abc'. But if the rep consists entirely of an exact integer number of reps of a subsubstring, then the substring is that subsubstring and the string consists of rep*n(n>1) + a partial.

Thus, I believe that there is only ever one results.

It will be interesting to pitch your solution against DamianConway's regex and see how they compare. I simply have no feel for it; but it's a job for tomorrow.

Thank you.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^3: Finding repeat sequences.
by hdb (Prior) on Jun 19, 2013 at 07:46 UTC

    UPDATE: WARNING: the following code does not work in all circumstances. Sorry!

    Here is a variant of tobyink's solution that uses index to look ahead when the current candidate string repeats and then enlengthens (is that an English word?) it accordingly.

    sub find_substring { my $input = shift; my $length = length $input; my $i = 0; my $possible; while( 1 ) { $possible = substr $input, 0, $i+1; # increase length by 1 $i = index $input, $possible, $i+1; # find next occurence of c +andidate return $input if $i < 0; # if not found return full + string => no repetition $possible = substr $input, 0, $i; # this is the minimum leng +th candidate return $possible if $input eq substr($possible x (1 + int($len +gth / $i)), 0, $length); # success } }

    UPDATE: Eily's solution below Re^3: Finding repeat sequences. can be used to avoid the construction of the repeated string (as it is the same just with offset). Therefore, this works even better:

    sub find_substring { my $input = shift; my $length = length $input; my $i = 0; while( 1 ) { $i = index( $input, substr( $input, 0, $i+1 ), $i+1); return $input if $i < 0; return substr( $input, 0, $i) if substr( $input, $i ) eq substr($i +nput, 0, $length - $i); } }

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1039689]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (2)
As of 2017-07-22 23:24 GMT
Find Nodes?
    Voting Booth?
    I came, I saw, I ...

    Results (342 votes). Check out past polls.