Here is a simple basic test for "is there an $n-digit repeated substring in $string?":

$string =~ /(.{$n}).*\1/;

That is: find an $n-digit substring, capture it as $1, then look forward in the string for another copy of $1. This is a reasonably efficient way to search for a repeated substring of a given length. However on a 9.2MB string that is still going to take a lot of work.

Note that this won't find an overlapping repeat, such as the repeated "121" in the string "3121213"; a more complex regexp could find such repeats, but would be much slower working on large strings.

One approach to finding the longest repeat would be to iterate $n upwards from 1:

sub longestrepeat { my($string) = @_; for (my $n = 1; 1; ++$n) { return $n - 1 unless $string =~ /(.{$n}).*\1/; } }

Other approaches to consider if that isn't fast enough are a) to do a binary chop (probably not useful in this case, since I think $n is likely to be smaller than log_2(digits)), or b) to start each subsequent search at the point the previous one succeeded (not too hard to code, but not likely to gain much).

You can extend the regexp to search for 3 copies of the same substring:

$string =~ /(.{$n}).*\1.*\1/;
.. or more, by including more copies of the .*\1 construct.

In general however I consider it extremely unlikely that you'll discover any patterns special to the digits of Mersenne primes expressed in decimal, beyond the features that any Mersenne number would have (ie the terminating digit patterns implicit in a number of the form 2n - 1). And of course doing the same thing in binary would be very boring. :)


In reply to Re: Longest repeated string... by hv
in thread Longest repeated string... by Yzzyx

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":