Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^2: Finding repeat sequences.

by BrowserUk (Pope)
on Jun 21, 2013 at 01:24 UTC ( #1040049=note: print w/ replies, xml ) Need Help??


in reply to Re: Finding repeat sequences.
in thread Finding repeat sequences.

and the task is to find maximum $pattern to fit these constraints?

Um. I cannot see any errors in that. So yes.

If yes, some simple mathematics should already considerably minimize the set of possible combinations you need to investigate with regexes.

Hm. A realistic, but relatively small, example from my test harness:

b:64000 in s: 640028748 hdb :: 24.290438 s

L=64000, N = 10,000, K=28,740.

But those could equally well be: L=16,000, N = 40,001, K=12,740; or (thousands*) of other permutations.

I don't think it helps.

(*I'm being very, very conservative; my best guess is 100s, of millions.)


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re^2: Finding repeat sequences.
Download Code
Re^3: Finding repeat sequences.
by LanX (Canon) on Jun 21, 2013 at 02:22 UTC
    here a regex solution which works for the shortest possible tail of length k

    DB<127> $str => "abcdabcdabceabcdabcdabceab" DB<128> $str=~/^((.+?).*)\2$/; $rest=$1, $tail=$2 => ("abcdabcdabceabcdabcdabce", "ab") DB<129> $rest =~ /^(.+?)\1*$/; $1 => "abcdabcdabce"

    needs to be extended for longer possible tails.

    But taking the dimensions of your data I doubt that regexes are appropriate.

    You could test all $patterns which repeat at least once (or x times) and calculate $k = $m % $l with $m =length ($str), and check if $str starts and ends with the same substring $tail of length $k and then check if the pattern continues repeating.

    Or start eliminating all possible $tails and check if $l of a repeating pattern is a divisor of the $rest.

    Had no time to check all the other posted solutions and don't wanna reinvent the wheel, so I better stop here! =)

    HTH

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      That looks suspiciously like a close variation on choroba's attempt.

      Had no time to check all the other posted solutions and don't wanna reinvent the wheel,

      All the tested solutions, along with how they faired in my test harness, are nicely grouped together in Re: Finding repeat sequences. (Results:Part 1).


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1040049]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (7)
As of 2014-11-23 09:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (129 votes), past polls