Re: Finding repeat sequences.

by LanX (Bishop)
 on Jun 21, 2013 at 01:03 UTC ( #1040046=note: print w/replies, xml ) Need Help??

in reply to Finding repeat sequences.

Still struggling to understand the task...

is

\$str = (\$pattern x \$n ) . substr(\$pattern,0,\$k)

with

0 <= \$k < (\$l = length(\$pattern))

and the task is to find maximum \$pattern for a given \$str to fit these constraints?

If yes, some simple mathematics should already considerably minimize the set of possible combinations you need to investigate with regexes.

Cheers Rolf

( addicted to the Perl Programming Language)

test
```  DB<109> \$pattern='abcdabcdabce'
=> "abcdabcdabce"

DB<110> \$n=2,\$k=2
=> (2, 2)

DB<111> \$str = (\$pattern x \$n ) . substr(\$pattern,0,\$k)
=> "abcdabcdabceabcdabcdabceab"

DB<112> \$str eq 'abcdabcdabceabcdabcdabceab'
=> 1

Replies are listed 'Best First'.
Re^2: Finding repeat sequences.
by hdb (Monsignor) on Jun 21, 2013 at 10:54 UTC

It is to find the shortest pattern, otherwise \$n==1 always.

Correction: replaced \$n=1 with \$n==1

Re^2: Finding repeat sequences.
by BrowserUk (Pope) on Jun 21, 2013 at 01:24 UTC
and the task is to find maximum \$pattern to fit these constraints?

Um. I cannot see any errors in that. So yes.

If yes, some simple mathematics should already considerably minimize the set of possible combinations you need to investigate with regexes.

Hm. A realistic, but relatively small, example from my test harness:

```b:64000 in s: 640028748        hdb :: 24.290438 s

L=64000, N = 10,000, K=28,740.

But those could equally well be: L=16,000, N = 40,001, K=12,740; or (thousands*) of other permutations.

I don't think it helps.

(*I'm being very, very conservative; my best guess is 100s, of millions.)

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
here a regex solution which works for the shortest possible tail of length k

```  DB<127> \$str
=> "abcdabcdabceabcdabcdabceab"

DB<128> \$str=~/^((.+?).*)\2\$/; \$rest=\$1, \$tail=\$2
=> ("abcdabcdabceabcdabcdabce", "ab")

DB<129> \$rest =~ /^(.+?)\1*\$/; \$1
=> "abcdabcdabce"

needs to be extended for longer possible tails.

But taking the dimensions of your data I doubt that regexes are appropriate.

You could test all \$patterns which repeat at least once (or x times) and calculate \$k = \$m % \$l with \$m =length (\$str), and check if \$str starts and ends with the same substring \$tail of length \$k and then check if the pattern continues repeating.

Or start eliminating all possible \$tails and check if \$l of a repeating pattern is a divisor of the \$rest.

Had no time to check all the other posted solutions and don't wanna reinvent the wheel, so I better stop here! =)

HTH

Cheers Rolf

( addicted to the Perl Programming Language)

That looks suspiciously like a close variation on choroba's attempt.

Had no time to check all the other posted solutions and don't wanna reinvent the wheel,

All the tested solutions, along with how they faired in my test harness, are nicely grouped together in Re: Finding repeat sequences. (Results:Part 1).

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Create A New User
Node Status?
node history
Node Type: note [id://1040046]
help
Chatterbox?
and the sunlight beams...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2018-02-22 13:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
When it is dark outside I am happiest to see ...

Results (293 votes). Check out past polls.

Notices?