note
bioMan
<p>I have done a quick test. Quick because of your blazing fast algorithm. My runtimes for 3 to 200 strings is given below. The times come from the Time::HiRes::gettimeofday function in your program. I set $subStrSize = 256.</p>
<code>
3,0.001355
4,0.002366
5,0.003744
6,0.005187
7,0.006808
8,0.008807
9,0.011459
10,0.014172
11,0.017328
12,0.020473
13,0.023491
14,0.027905
15,0.031813
16,0.036426
17,0.043177
18,0.067228
19,0.052098
20,0.059449
30,0.132611
40,0.235241
50,0.367828
60,0.510851
70,0.695534
80,0.913879
90,1.178992
100,1.427899
125,2.254988
150,3.269179
200,5.831225
</code>
<p>These results are from a second copy of your program that I created. It appears my first copy still has a bug.</p>
<p>Yes, there must be a bug in my original copy. Running my full data set with my backup copy of your program gave a runtime of 14.759568. I'd say that's a little better than three years! I'll check the results to see how they compare with some of my old results from limited data sets. That check could take a little time :-)</p>
<p>The problem could be with the way PerlIDE runs scripts. All the times above are from the command line (Win2000).</p><p> I ran the full data set with the second copy of the program from PerlIDE and got a runtime of 15.147881. So the problem is with <i>my first copy</i> of your program.</p>
<p>I <i>really</i> like your algorithm. It's speed will allow me to modify my original data and test a couple of theories. You've got a real winner.</p>
<i>update</i>
<p>I checked the results from the program and the algorithm isn't finding all the matches. I made a modification which seems to do better.</p>
<p>I added a new variable called $filterSize. This is used to exclude matches below a minimum length for output. It replaces $subStrSize. $subStrSize is now used to set the minimum string search size. The statement used to output the results is also modified.</p>
<code>
my $subStrSize = 128;
my $filterSize = 256;
.
.
.
$localBest[4] > $filterSize
? printf OUT "%10s::%-10s Length[%4d] Starts(%4d %4d)\n",
$strings[$localBest[0]][0], $strings[$localBest[1]][0], $localBest[4], $localBest[2], $localBest[3]
: print OUT " Didn't find LCS for $sourceName and $targetName\n";
</code>
<p>The last statement replaces:</p>
<code>
@localBest
? printf OUT "%10s::%-10s Length[%4d] Starts(%4d %4d)\n",
$strings[$localBest[0]][0], $strings[$localBest[1]][0], $localBest[4], $localBest[2], $localBest[3]
: print OUT " Didn't find LCS for $sourceName and $targetName\n";
</code>
<p>I hoped this would give me better results. I used $filterSize = 256. I set $subStrSize to 16, 32, 64, or 128. The runtimes for $subStrSize were 250.505445, 124.928021, 57.274645, and 32.787612, respectively. I got the best results with $subStrSize = 128, $filterSize = 256. I don't know why $subStrSize = 128 gave better results than shorter lengths. Also $subStrSize = 256 gave significantly poorer results than any of the shorter lengths.</p>
485464
486734