I'm not a bioinformatics guy, but if massive runs are actually a significant concern in this system, it's pretty easy to fix the worst-case behavior by adaptively boosting the window, so that worst case regresses to hdb's solution:
sub kennethk {
my $string = shift;
my $off = 0;
my $step = 10;
my @indices = sort {$off = 0;
until (substr($string, $a+$off, $step) cmp sub
+str($string, $b+$off, $step)){
$off += $step;
$step *= 2;
}
+ } 0 .. length($string) - 1;
$_++ for @indices;
return @indices;
}
which gives a degenerate-case comparison of
RobertCraven 2.90/s
kennethk 12.8/s
xxx 13.1/s
hdb 14.2/s
and comparable behavior in the rest of the benchmarks.
I also note your computer is twice as fast as mine. Wanna trade?
#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.
|