in reply to
Finding repeat sequences.

Consider the following:

`use strict;
use warnings;
# 1 2 3 4 5
+ 6
# 12345678901234567890123456789012345678901234567890123456
+7890
my $search = 'abcdefabcababeabcdefabcababeabcdefabcababeabcdefabcababe
+abcd';
print "searching '$search'\n length=" . length($search) . "\n";
# FIRST, FIND THE LENGTH OF THE TAIL: THIS IS THE SHORTEST SUBSTRING
# WHICH MATCHES THE START OF THE STRING.
# TRY TO EXPAND THIS NUMBER AS RAPIDLY AS POSSIBLE.
my $tail_length = 1;
my $tail_step = int(length($search) / 2);
while ($tail_step > 0) {
$tail_length += $tail_step
while ( substr($search, 0, $tail_length + $tail_step)
eq substr($search, -($tail_length+$tail_step),
($tail_length+$tail_step)));
$tail_step = int($tail_step / 2);
}
print "tail is '" . substr($search, 0, $tail_length) .
"' length=$tail_length\n";
# ===
my $body_length = length($search) - $tail_length;
print "body is '$body_length' characters long ... \n";
# THE BODY MUST CONTAIN AN INTEGRAL NUMBER OF OCCURRENCES
# OF THE STRING.
# THEREFORE, FIND THE LONGEST ONE THAT OCCURS TWICE.
my $longest = $body_length;
my $n = $body_length - 1;
while ($n > 1) {
if ( ($body_length % $n) == 0) {
print "try $n\n";
if ( substr($search, 0, $n) eq substr($search, $n, $n) ) {
$longest = $n;
last;
}
}
$n--;
}
print "longest string '" . substr($search, 1, $longest) .
"' length=$longest\n";
`

The “key” to this puzzle, as I read it, is the __tail__. This is the shortest string at the end of the string which matches the beginning of the string. The example code tries aggressively to find that number by adding power-of-two-smaller successive increments as many times as it can. *(Note:* might this introduce a flaw, vs. a simple 2-up count?)

The second part of the algorithm now tries to find the longest string which occurs an integral number of times in the leading *(repeated ...)* portion. We know that the length of this string must be a mathematical factor, i.e. *(length* mod *factor* == 0). Only the first and second occurrence must be considered. If none can be found, the string consists of one non-repeated occurrence.