Re^2: Fastest way to lookup a point in a set

Ken:

I'd've saved myself a little time had I read the entire thread before going off and trying a double-hash. Mine was essentially the same as yours, but I was concerned about auto-vivification making the hash grow without bounds during use. When I tested with my anti-autovivification bit, I noticed a speed boost on the double-hash vs. the others when doing lookups for items that don't exist.

I modified all the lookup routines accept a check count so I could specify the number of points to check, and built a points2 array with a bunch of points that aren't in the original. So my lookup routine looks like:

sub dbl_look {
    my $cells = shift;
    my $chk = shift;
    my $cnt = 0;
    for my $p (@points2[0 .. $chk]) {
        (exists $cells->{$p->[0]} and exists $cells->{$p->[0]}{$p->[1]
+}) and ++$cnt;
    }
    exists $cells->{'notfound'} and die;
    exists $cells->{'notfound2'} and die;
    exists $cells->{'notfound3'} and die;
}
[download]

I then called each of the routines with the original data, the original data plus the same amount of non-existing points, and then the original data plus twice the number of non-existing points:

timethese 200000, {
   StjA => sub { stj_look($stj_ref, $npoints },
   StjB => sub { stj_look($stj_ref, 2*$npoints },
   StjC => sub { stj_look($stj_ref, 3*$npoints },
   ...
};
[download]

The results were:

$ perl pm_1196786.pl
131, keys 131
str: 131
131, keys 131
stj: 131
big: 131
pak: 131
dbl: 131
Benchmark: timing 200000 iterations of BigA, BigB, BigC, DblA, DblB, D
+blC, PakA, PakB, PakC, StjA, StjB, StjC, StrA, StrB, StrC...
      BigA:  7 wallclock secs ( 6.51 usr +  0.00 sys =  6.51 CPU) @ 30
+698.39/s (n=200000)
      BigB: 13 wallclock secs (12.95 usr +  0.00 sys = 12.95 CPU) @ 15
+439.25/s (n=200000)
      BigC: 19 wallclock secs (18.97 usr +  0.00 sys = 18.97 CPU) @ 10
+543.52/s (n=200000)
      DblA:  6 wallclock secs ( 6.50 usr +  0.00 sys =  6.50 CPU) @ 30
+773.97/s (n=200000)
      DblB: 10 wallclock secs ( 9.23 usr +  0.00 sys =  9.23 CPU) @ 21
+659.09/s (n=200000)
      DblC: 12 wallclock secs (11.97 usr +  0.00 sys = 11.97 CPU) @ 16
+709.83/s (n=200000)
      PakA:  6 wallclock secs ( 6.31 usr +  0.00 sys =  6.31 CPU) @ 31
+685.68/s (n=200000)
      PakB: 13 wallclock secs (12.45 usr +  0.00 sys = 12.45 CPU) @ 16
+060.39/s (n=200000)
      PakC: 18 wallclock secs (18.34 usr +  0.00 sys = 18.34 CPU) @ 10
+902.75/s (n=200000)
      StjA:  7 wallclock secs ( 6.91 usr +  0.00 sys =  6.91 CPU) @ 28
+960.32/s (n=200000)
      StjB: 14 wallclock secs (13.56 usr +  0.00 sys = 13.56 CPU) @ 14
+746.00/s (n=200000)
      StjC: 20 wallclock secs (19.94 usr +  0.00 sys = 19.94 CPU) @ 10
+031.10/s (n=200000)
      StrA:  6 wallclock secs ( 6.36 usr +  0.00 sys =  6.36 CPU) @ 31
+446.54/s (n=200000)
      StrB: 13 wallclock secs (12.67 usr +  0.00 sys = 12.67 CPU) @ 15
+782.83/s (n=200000)
      StrC: 19 wallclock secs (18.69 usr +  0.00 sys = 18.69 CPU) @ 10
+702.63/s (n=200000)
[download]

Dbl fares pretty well against the others when a significant amount of lookups fail.

Full source...

$ cat pm_1196786.pl
use strict;
# use warnings;
use Benchmark qw(timethese);

my @points = (
   [  0,  0 ],
   [ -1, -2 ],
   [  1,  2 ],
   [ -1,  2 ],
   [  1, -2 ],
   [  0,  1 ],
   [  1,  0 ],
   [ -1,  0 ],
   [  0, -1 ],
   [  2147483647,  2147483647 ],
   [  2147483647, -2147483647 ],
   [ -2147483647,  2147483647 ],
   [ -2147483647, -2147483647 ],
   [  -1,  2147483647 ],
   [  2147483647, 1 ],
   [  -2,  2147483646 ],
   [  1,   -2147483647 ],
   [  1234561,  1234562 ],
   [  1234563,  -1234564 ],
   [  -1234565,  1234566 ],
   [  -1234567,  -1234568 ],
   [  10,  11 ],
   [  11,  12 ],
   [  12,  13 ],
   [  13,  14 ],
   [  14,  15 ],
   [  15,  16 ],
   [  16,  17 ],
   [  17,  18 ],
   [  18,  19 ],
   [  19,  20 ],
   [  1001,  1002 ],
   [  1003,  1004 ],
   [  1005,  1006 ],
   [  1007,  1008 ],
   [  1009,  1010 ],
   [  1011,  1012 ],
   [  1013,  1014 ],
   [  1015,  1016 ],
   [  1017,  1018 ],
   [  1019,  1020 ],
   [  -1001,  -1002 ],
   [  -1003,  -1004 ],
   [  -1005,  -1006 ],
   [  -1007,  -1008 ],
   [  -1009,  -1010 ],
   [  -1011,  -1012 ],
   [  -1013,  -1014 ],
   [  -1015,  -1016 ],
   [  -1017,  -1018 ],
   [  -1019,  -1020 ],
   [  99910,  99911 ],
   [  99911,  99912 ],
   [  99912,  99913 ],
   [  99913,  99914 ],
   [  99914,  99915 ],
   [  99915,  99916 ],
   [  99916,  99917 ],
   [  99917,  99918 ],
   [  99918,  99919 ],
   [  99919,  99920 ],
   [  -99910,  -99911 ],
   [  -99911,  -99912 ],
   [  -99912,  -99913 ],
   [  -99913,  -99914 ],
   [  -99914,  -99915 ],
   [  -99915,  -99916 ],
   [  -99916,  -99917 ],
   [  -99917,  -99918 ],
   [  -99918,  -99919 ],
   [  -99919,  -99920 ],
   [  1099910,  1099911 ],
   [  1099911,  1099912 ],
   [  1099912,  1099913 ],
   [  1099913,  1099914 ],
   [  1099914,  1099915 ],
   [  1099915,  1099916 ],
   [  1099916,  1099917 ],
   [  1099917,  1099918 ],
   [  1099918,  1099919 ],
   [  1099919,  1099920 ],
   [  -1099910,  -1099911 ],
   [  -1099911,  -1099912 ],
   [  -1099912,  -1099913 ],
   [  -1099913,  -1099914 ],
   [  -1099914,  -1099915 ],
   [  -1099915,  -1099916 ],
   [  -1099916,  -1099917 ],
   [  -1099917,  -1099918 ],
   [  -1099918,  -1099919 ],
   [  -1099919,  -1099920 ],
   [  91099910,  91099911 ],
   [  91099911,  91099912 ],
   [  91099912,  91099913 ],
   [  91099913,  91099914 ],
   [  91099914,  91099915 ],
   [  91099915,  91099916 ],
   [  91099916,  91099917 ],
   [  91099917,  91099918 ],
   [  91099918,  91099919 ],
   [  91099919,  91099920 ],
   [  -91099910,  -91099911 ],
   [  -91099911,  -91099912 ],
   [  -91099912,  -91099913 ],
   [  -91099913,  -91099914 ],
   [  -91099914,  -91099915 ],
   [  -91099915,  -91099916 ],
   [  -91099916,  -91099917 ],
   [  -91099917,  -91099918 ],
   [  -91099918,  -91099919 ],
   [  -91099919,  -91099920 ],
   [  491099910,  491099911 ],
   [  491099911,  491099912 ],
   [  491099912,  491099913 ],
   [  491099913,  491099914 ],
   [  491099914,  491099915 ],
   [  491099915,  491099916 ],
   [  491099916,  491099917 ],
   [  491099917,  491099918 ],
   [  491099918,  491099919 ],
   [  491099919,  491099920 ],
   [  -491099910,  -491099911 ],
   [  -491099911,  -491099912 ],
   [  -491099912,  -491099913 ],
   [  -491099913,  -491099914 ],
   [  -491099914,  -491099915 ],
   [  -491099915,  -491099916 ],
   [  -491099916,  -491099917 ],
   [  -491099917,  -491099918 ],
   [  -491099918,  -491099919 ],
   [  -491099919,  -491099920 ],
);
my $npoints = @points;

my @points2 = @points;
push @points2, [ $_->[0]+29, $_->[1]+45 ] for @points;
push @points2, [ $_->[0]+450, $_->[1]+290 ] for @points;

sub dbl_hash {
   my $cnt=0;
   my %cells;
   for my $p (@points) {
       $cells{$p->[0]}{$p->[1]}=undef;
   }
   for my $p (@points) {
       (exists $cells{$p->[0]} && exists $cells{$p->[0]}{$p->[1]}) and
+ ++$cnt;
   }
   exists $cells{'notfound'} and die;
   exists $cells{'notfound2'} and die;
   exists $cells{'notfound3'} and die;
   print "dbl: $cnt\n";
   return \%cells;
}

sub str_hash {
   my $cnt=0;
   my %cells;
   for my $p (@points) {
      my $h = $p->[0] . ':' . $p->[1];
      my ( $x, $y ) = split ':', $h;
      if ($x != $p->[0]) { die; }
      if ($y != $p->[1]) { die; }
      $cells{$h} = undef;
   }
   print "$npoints, keys ", scalar(keys %cells), "\n";
   scalar(keys %cells) == $npoints or die;
   for my $p (@points) {
      my $h = $p->[0] . ':' . $p->[1];
      exists $cells{$h} and ++$cnt; #or die;
   }
   exists $cells{'notfound'} and die;
   exists $cells{'notfound2'} and die;
   exists $cells{'notfound3'} and die;
   print "str: $cnt\n";
   return \%cells;
}

sub stj_hash {
   my $cnt=0;
   my %cells;
   for my $p (@points) {
      my $h = join(':', @$p);
      my ( $x, $y ) = split ':', $h;
      if ($x != $p->[0]) { die; }
      if ($y != $p->[1]) { die; }
      $cells{$h} = undef;
   }
   print "$npoints, keys ", scalar(keys %cells), "\n";
   scalar(keys %cells) == $npoints or die;
   for my $p (@points) {
      exists $cells{join(':',@$p)} and ++$cnt;
   }
   exists $cells{'notfound'} and die;
   exists $cells{'notfound2'} and die;
   exists $cells{'notfound3'} and die;
   print "stj: $cnt\n";
   return \%cells;
}

sub big_hash {
   my $cnt=0;
   my %cells;
   for my $p (@points) {
      my $h = ($p->[1] << 32) | ($p->[0] & 0xFFFFFFFF);
      my $x =  $h & 0x00000000FFFFFFFF;
      my $y = ($h & 0xFFFFFFFF00000000) >> 32;
      if ($x >> 31) { $x -= 0xFFFFFFFF + 1 }
      if ($y >> 31) { $y -= 0xFFFFFFFF + 1 }
      if ($x != $p->[0]) { die; }
      if ($y != $p->[1]) { die; }
      $cells{$h} = undef;
   }
   scalar(keys %cells) == $npoints or die;
   for my $p (@points) {
      my $h = ($p->[1] << 32) | ($p->[0] & 0xFFFFFFFF);
      exists $cells{$h} and ++$cnt;
   }
   exists $cells{'notfound'} and die;
   exists $cells{'notfound2'} and die;
   exists $cells{'notfound3'} and die;
   print "big: $cnt\n";
   return \%cells;
}

sub pak_hash {
   my $cnt=0;
   my %cells;
   for my $p (@points) {
      my $h = pack "ii", $p->[0], $p->[1];
      my ( $x, $y ) =  unpack "ii", $h;
      if ($x != $p->[0]) { die; }
      if ($y != $p->[1]) { die; }
      $cells{$h} = undef;
   }
   scalar(keys %cells) == $npoints or die;
   for my $p (@points) {
      my $h = pack "ii", $p->[0], $p->[1];
      exists $cells{$h} and ++$cnt; #or die;
   }
   exists $cells{'notfound'} and die;
   exists $cells{'notfound2'} and die;
   exists $cells{'notfound3'} and die;
   print "pak: $cnt\n";
   return \%cells;
}

sub stj_look {
   my $cells = shift;
   my $chk = shift;
   my $cnt = 0;
   for my $p (@points2[0 .. $chk]) {
      exists $cells->{join(':', @$p)} and ++$cnt;
   }
   exists $cells->{'notfound'} and die;
   exists $cells->{'notfound2'} and die;
   exists $cells->{'notfound3'} and die;
}

sub str_look {
   my $cells = shift;
   my $chk = shift;
   my $cnt = 0;
   for my $p (@points2[0 .. $chk]) {
      exists $cells->{$p->[0] . ':' . $p->[1]} and ++$cnt;
   }
   exists $cells->{'notfound'} and die;
   exists $cells->{'notfound2'} and die;
   exists $cells->{'notfound3'} and die;
}

sub big_look {
   my $cells = shift;
   my $chk = shift;
   my $cnt = 0;
   for my $p (@points2[0 .. $chk]) {
      exists $cells->{($p->[1] << 32) | ($p->[0] & 0xFFFFFFFF)} and ++
+$cnt;
   }
   exists $cells->{'notfound'} and die;
   exists $cells->{'notfound2'} and die;
   exists $cells->{'notfound3'} and die;
}

sub pak_look {
   my $cells = shift;
   my $chk = shift;
   my $cnt = 0;
   for my $p (@points2[0 .. $chk]) {
      exists $cells->{pack "ii", $p->[0], $p->[1]} and ++$cnt;
   }
   exists $cells->{'notfound'} and die;
   exists $cells->{'notfound2'} and die;
   exists $cells->{'notfound3'} and die;
}

sub dbl_look {
    my $cells = shift;
    my $chk = shift;
    my $cnt = 0;
    for my $p (@points2[0 .. $chk]) {
        (exists $cells->{$p->[0]} and exists $cells->{$p->[0]}{$p->[1]
+}) and ++$cnt;
    }
   exists $cells->{'notfound'} and die;
   exists $cells->{'notfound2'} and die;
   exists $cells->{'notfound3'} and die;
}

my $str_ref = str_hash();
my $stj_ref = stj_hash();
my $big_ref = big_hash();
my $pak_ref = pak_hash();
my $dbl_ref = dbl_hash();

timethese 200000, {
   StjA => sub { stj_look($stj_ref, $npoints) },
   StrA => sub { str_look($str_ref, $npoints) },
   BigA => sub { big_look($big_ref, $npoints) },
   PakA => sub { pak_look($pak_ref, $npoints) },
   DblA => sub { dbl_look($dbl_ref, $npoints) },
   StjB => sub { stj_look($stj_ref, 2*$npoints) },
   StrB => sub { str_look($str_ref, 2*$npoints) },
   BigB => sub { big_look($big_ref, 2*$npoints) },
   PakB => sub { pak_look($pak_ref, 2*$npoints) },
   DblB => sub { dbl_look($dbl_ref, 2*$npoints) },
   StjC => sub { stj_look($stj_ref, 3*$npoints) },
   StrC => sub { str_look($str_ref, 3*$npoints) },
   BigC => sub { big_look($big_ref, 3*$npoints) },
   PakC => sub { pak_look($pak_ref, 3*$npoints) },
   DblC => sub { dbl_look($dbl_ref, 3*$npoints) },
};
[download]

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Comment on Re^2: Fastest way to lookup a point in a set Select or Download Code


Pathologically Eclectic Rubbish Lister
	PerlMonks