Here is a simple basic test for "is there an \$n-digit repeated substring in \$string?":

```  \$string =~ /(.{\$n}).*\1/;

That is: find an \$n-digit substring, capture it as \$1, then look forward in the string for another copy of \$1. This is a reasonably efficient way to search for a repeated substring of a given length. However on a 9.2MB string that is still going to take a lot of work.

Note that this won't find an overlapping repeat, such as the repeated "121" in the string "3121213"; a more complex regexp could find such repeats, but would be much slower working on large strings.

One approach to finding the longest repeat would be to iterate \$n upwards from 1:

```  sub longestrepeat {
my(\$string) = @_;
for (my \$n = 1; 1; ++\$n) {
return \$n - 1 unless \$string =~ /(.{\$n}).*\1/;
}
}

Other approaches to consider if that isn't fast enough are a) to do a binary chop (probably not useful in this case, since I think \$n is likely to be smaller than log_2(digits)), or b) to start each subsequent search at the point the previous one succeeded (not too hard to code, but not likely to gain much).

You can extend the regexp to search for 3 copies of the same substring:

```  \$string =~ /(.{\$n}).*\1.*\1/;
.. or more, by including more copies of the .*\1 construct.

In general however I consider it extremely unlikely that you'll discover any patterns special to the digits of Mersenne primes expressed in decimal, beyond the features that any Mersenne number would have (ie the terminating digit patterns implicit in a number of the form 2n - 1). And of course doing the same thing in binary would be very boring. :)

Hugo

In reply to Re: Longest repeated string... by hv
in thread Longest repeated string... by Yzzyx

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

• Are you posting in the right place? Check out Where do I post X? to know for sure.
• Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
• Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
• Want more info? How to link or or How to display code and escape characters are good places to start.