Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
go ahead... be a heretic
 
PerlMonks  

Re^2: Finding repeat sequences. (only mostly regex)

by BrowserUk (Pope)
on Jun 18, 2013 at 20:04 UTC ( #1039646=note: print w/ replies, xml ) Need Help??


in reply to Re: Finding repeat sequences. (only mostly regex)
in thread Finding repeat sequences.

I assume that the pattern must repeat at least twice, otherwise, the full string is always the longest answer.

I wish that were the case. It mostly will be, but sometimes the string will consist of 1 complete and 1 partial rep.

But the partial rep at the end *will* exactly match the same number of characters at the beginning of the string, so it will always be possible to determine the rep.

But how to encode that in a regex or at least avoid a brute force chop and compare?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re^2: Finding repeat sequences. (only mostly regex)
Re^3: Finding repeat sequences. (only mostly regex)
by choroba (Abbot) on Jun 18, 2013 at 20:10 UTC
    Cannot you find the incomplete repetition with
    /^(.*).*\1$/
    ?

    If it is complete, you get the whole one.

    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      There are no gaps between the repeats, so the uncaptured .* is not required (actually mustn't be there).

      And if the second rep is incomplete \1 will never match before $.

      I've been trying variations on

      $s = 'aaaabaaaabaaaaabaaaab';; $s =~ m[^(.+)\1*(.*?$)] and $1 =~ $2 and print "$1/$2";; aaaabaaaabaaaaabaaaab/

      With the idea that any partial rep at the end can be verified again the beginning of the full rep, but it needs to happen inside the regex and cause backtracking.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        The .* matches the missing part of the last incomplete repetition.
        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re^3: Finding repeat sequences. (only regex)
by tye (Cardinal) on Jun 18, 2013 at 20:13 UTC

    Note that, based on that definition, if the first and last characters are the same, then the answer is "the string minus the last character". Which leads to:

    /^(.+?).*\1$/

    Which leads to a full solution of:

    /^((.*?).*?)\2*\1$/

    which might be horribly inefficient (at least for some cases) or might not; I haven't considered it.

    - tye        

      Nice reversal of the logic and closer:

      $s = 'aaaabaaaabaaaaabaaaab';; $s =~ /^((.*?).*?)\2*\1$/ and print "$2/$1";; a/aaaabaaaab

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        I think I had \1 and \2 backward. Then you just have to disallow the trivial solution:

        $s = 'aaaabaaaabaaaaabaaaab'; $s =~ /^((.*?).*?)(?=.)\1*\2$/ and print "$2/$1";

        (Update: Dropped the unneeded () around \1 that I had introduced while debugging. You probably also need to change .*? to .* so you get the longest solution not the shortest.)

        - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1039646]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (11)
As of 2014-04-18 09:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (464 votes), past polls