in reply to Most common substring

This could be done in one pass through the string.

Conceptually it would work like so.

use strict; my @nums=split('', $number); # the number to work on my(@most_common, %once, %common); my $mcc=2; # if there are no common substrings don't store while( @nums > 4) { my $key = join('', @nums[0..4]); $common{$key}++; if($common{$key} > $mcc) { $mcc=$common{$key}; @most_common=($key); #new max set entire array to $key } elsif($common{$key} == $mcc) { push(@most_common, $key); # tack $key onto largest } if($common{$key} == 1) { $once{$key}=1; } elsif($once{$key}) { delete $once{$key}; } shift @nums; #slide down 1 digit } print "Most ($mcc): ", join(', ', @most_common), "\n"; print "Once: ", join(', ', (keys %once)), "\n";

I have added a bit of code to track the most common values and save them in an array. I would expect that least common will most likely be the set of substrings only used once. Unless you define common to be lowest but more than once. Then much more bookkeeping code would be needed for the low end. On the high end if the current case is larger than the previous largest we simply forget about the list and start a new list with the current substring as it's only member. I also started $mcc at 2 to avoid a lot of needless bookkeeping for substrings seen only once.

I ran the code on sample data of 2^9999 and got

Most (2): 96655, 84403, 66114, 11748, 17484, 40380, 74169, 41696, 41844, 47194, 71162, 92065, 54736, 28703, 84689, 22165, 92292, 47369, 41891, 87379, 37954, 04224, 42244, 08257, 35778, 23461, 29741, 19795, 79549, 78117, 56688, 58090, 43252, 32528, 42018, 98726, 03714, 41492, 24440, 01363, 40657, 90170, 41347, 48935, 89357

Once: every other substring which is a long list.