Comparing a large set of DNA sequences

shamshersingh has asked for the wisdom of the Perl Monks concerning the following question:

I have a large set (100000+) of short DNA reads 20 characters long. I need to compares all reads against each other and pull out those that vary by just 1 position. Heres the script I came up with.

$| = 1;
my $compare_count = 0;
for (my $i = 0; $i < @kmers; $i++ ) {
    for (my $j = $i + 1; $j < @kmers; $j++ ) {
        print "\rComparing sequence $i to $j";
        my @result = PCCompare::dissimilarity($kmers[$i], $kmers[$j], 
+1);
        if ($result[0] == 1) {
            print "\rMatch found: $kmers[$i], $kmers[$j]\n";
            push @variant_kmers, ($kmers[$i], $kmers[$j]);
        }
        $compare_count++;
    }
}
print "\rFinished: $compare_count comparisions made.\n";
[download]

The problem is that this loop runs very very slow. It takes on the orders of days to process 100000 sequences. Is there a way to make the process faster?

Back to Seekers of Perl Wisdom