You can do slightly better than what you just suggested for a mergesort of k lists (when k is not super-small).
Yes, but it's much harder to code off the top of my head. :)
What I fail to understand is why you say that perrin's hash algorithm takes O(n log n) time. No, a hash dereferencing operation takes amortized constant time, and you are just doing a dereference for each element in each list. That's a linear time algorithm.
My understanding has always been that hash operations over
n items take
lg n time, since in general a hash will divide up
n items into
n / lg n buckets, then do a linear search through the appropriate bucket.
Some experiments seem to bear this out: as I increase the number of elements in a hash, the number of operations per second I can do decreases. This wouldn't happen if hash operations took constant time.
#!/usr/bin/perl
use warnings;
use strict;
use Time::HiRes qw(time);
my %h;
foreach my $size (100_000, 200_000, 400_000, 800_000, 1_600_000, 3_200
+_000)
{
my %h;
my($start,$end);
# Test insert times
$start = time;
foreach my $i (1..$size)
{
$h{$i} = $i;
}
$end = time;
my $insert_time = $end - $start;
# Now test lookup times
$start = time;
foreach my $i (1..$size)
{
$h{$i} == $i
or die "what the hell?";
}
$end = time;
my $lookup_time = $end - $start;
printf "%10d: %5.3f inserts/sec, %5.3f lookups/sec\n",
$size, $size/$insert_time, $size/$lookup_time;
}
100000: 446368.754 inserts/sec, 1050330.051 lookups/sec
200000: 424977.633 inserts/sec, 821176.759 lookups/sec
400000: 398651.394 inserts/sec, 805369.896 lookups/sec
800000: 399463.754 inserts/sec, 787853.768 lookups/sec
1600000: 390381.775 inserts/sec, 766870.015 lookups/sec
3200000: 377860.715 inserts/sec, 720909.739 lookups/sec
Then again, my recollections of algorithmic amortizations are hazy at best, so feel free to convince me that I'm wrong. :)