### Re^2: Finding repeat sequences.

by BrowserUk (Pope)
 on Jun 21, 2013 at 01:24 UTC

in reply to Re: Finding repeat sequences.

and the task is to find maximum \$pattern to fit these constraints?

Um. I cannot see any errors in that. So yes.

If yes, some simple mathematics should already considerably minimize the set of possible combinations you need to investigate with regexes.

Hm. A realistic, but relatively small, example from my test harness:

```b:64000 in s: 640028748        hdb :: 24.290438 s

L=64000, N = 10,000, K=28,740.

But those could equally well be: L=16,000, N = 40,001, K=12,740; or (thousands*) of other permutations.

I don't think it helps.

(*I'm being very, very conservative; my best guess is 100s, of millions.)

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: Finding repeat sequences.
by LanX (Chancellor) on Jun 21, 2013 at 02:22 UTC
here a regex solution which works for the shortest possible tail of length k

```  DB<127> \$str
=> "abcdabcdabceabcdabcdabceab"

DB<128> \$str=~/^((.+?).*)\2\$/; \$rest=\$1, \$tail=\$2
=> ("abcdabcdabceabcdabcdabce", "ab")

DB<129> \$rest =~ /^(.+?)\1*\$/; \$1
=> "abcdabcdabce"

needs to be extended for longer possible tails.

But taking the dimensions of your data I doubt that regexes are appropriate.

You could test all \$patterns which repeat at least once (or x times) and calculate \$k = \$m % \$l with \$m =length (\$str), and check if \$str starts and ends with the same substring \$tail of length \$k and then check if the pattern continues repeating.

Or start eliminating all possible \$tails and check if \$l of a repeating pattern is a divisor of the \$rest.

Had no time to check all the other posted solutions and don't wanna reinvent the wheel, so I better stop here! =)

HTH

Cheers Rolf

( addicted to the Perl Programming Language)

That looks suspiciously like a close variation on choroba's attempt.

Had no time to check all the other posted solutions and don't wanna reinvent the wheel,

All the tested solutions, along with how they faired in my test harness, are nicely grouped together in Re: Finding repeat sequences. (Results:Part 1).

