Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^2: Fastest way to lookup a point in a set

by roboticus (Chancellor)
on Aug 05, 2017 at 16:36 UTC ( [id://1196802]=note: print w/replies, xml ) Need Help??


in reply to Re: Fastest way to lookup a point in a set
in thread Fastest way to lookup a point in a set

Ken:

I'd've saved myself a little time had I read the entire thread before going off and trying a double-hash. Mine was essentially the same as yours, but I was concerned about auto-vivification making the hash grow without bounds during use. When I tested with my anti-autovivification bit, I noticed a speed boost on the double-hash vs. the others when doing lookups for items that don't exist.

I modified all the lookup routines accept a check count so I could specify the number of points to check, and built a points2 array with a bunch of points that aren't in the original. So my lookup routine looks like:

sub dbl_look { my $cells = shift; my $chk = shift; my $cnt = 0; for my $p (@points2[0 .. $chk]) { (exists $cells->{$p->[0]} and exists $cells->{$p->[0]}{$p->[1] +}) and ++$cnt; } exists $cells->{'notfound'} and die; exists $cells->{'notfound2'} and die; exists $cells->{'notfound3'} and die; }

I then called each of the routines with the original data, the original data plus the same amount of non-existing points, and then the original data plus twice the number of non-existing points:

timethese 200000, { StjA => sub { stj_look($stj_ref, $npoints }, StjB => sub { stj_look($stj_ref, 2*$npoints }, StjC => sub { stj_look($stj_ref, 3*$npoints }, ... };

The results were:

$ perl pm_1196786.pl 131, keys 131 str: 131 131, keys 131 stj: 131 big: 131 pak: 131 dbl: 131 Benchmark: timing 200000 iterations of BigA, BigB, BigC, DblA, DblB, D +blC, PakA, PakB, PakC, StjA, StjB, StjC, StrA, StrB, StrC... BigA: 7 wallclock secs ( 6.51 usr + 0.00 sys = 6.51 CPU) @ 30 +698.39/s (n=200000) BigB: 13 wallclock secs (12.95 usr + 0.00 sys = 12.95 CPU) @ 15 +439.25/s (n=200000) BigC: 19 wallclock secs (18.97 usr + 0.00 sys = 18.97 CPU) @ 10 +543.52/s (n=200000) DblA: 6 wallclock secs ( 6.50 usr + 0.00 sys = 6.50 CPU) @ 30 +773.97/s (n=200000) DblB: 10 wallclock secs ( 9.23 usr + 0.00 sys = 9.23 CPU) @ 21 +659.09/s (n=200000) DblC: 12 wallclock secs (11.97 usr + 0.00 sys = 11.97 CPU) @ 16 +709.83/s (n=200000) PakA: 6 wallclock secs ( 6.31 usr + 0.00 sys = 6.31 CPU) @ 31 +685.68/s (n=200000) PakB: 13 wallclock secs (12.45 usr + 0.00 sys = 12.45 CPU) @ 16 +060.39/s (n=200000) PakC: 18 wallclock secs (18.34 usr + 0.00 sys = 18.34 CPU) @ 10 +902.75/s (n=200000) StjA: 7 wallclock secs ( 6.91 usr + 0.00 sys = 6.91 CPU) @ 28 +960.32/s (n=200000) StjB: 14 wallclock secs (13.56 usr + 0.00 sys = 13.56 CPU) @ 14 +746.00/s (n=200000) StjC: 20 wallclock secs (19.94 usr + 0.00 sys = 19.94 CPU) @ 10 +031.10/s (n=200000) StrA: 6 wallclock secs ( 6.36 usr + 0.00 sys = 6.36 CPU) @ 31 +446.54/s (n=200000) StrB: 13 wallclock secs (12.67 usr + 0.00 sys = 12.67 CPU) @ 15 +782.83/s (n=200000) StrC: 19 wallclock secs (18.69 usr + 0.00 sys = 18.69 CPU) @ 10 +702.63/s (n=200000)

Dbl fares pretty well against the others when a significant amount of lookups fail.

Full source...

$ cat pm_1196786.pl use strict; # use warnings; use Benchmark qw(timethese); my @points = ( [ 0, 0 ], [ -1, -2 ], [ 1, 2 ], [ -1, 2 ], [ 1, -2 ], [ 0, 1 ], [ 1, 0 ], [ -1, 0 ], [ 0, -1 ], [ 2147483647, 2147483647 ], [ 2147483647, -2147483647 ], [ -2147483647, 2147483647 ], [ -2147483647, -2147483647 ], [ -1, 2147483647 ], [ 2147483647, 1 ], [ -2, 2147483646 ], [ 1, -2147483647 ], [ 1234561, 1234562 ], [ 1234563, -1234564 ], [ -1234565, 1234566 ], [ -1234567, -1234568 ], [ 10, 11 ], [ 11, 12 ], [ 12, 13 ], [ 13, 14 ], [ 14, 15 ], [ 15, 16 ], [ 16, 17 ], [ 17, 18 ], [ 18, 19 ], [ 19, 20 ], [ 1001, 1002 ], [ 1003, 1004 ], [ 1005, 1006 ], [ 1007, 1008 ], [ 1009, 1010 ], [ 1011, 1012 ], [ 1013, 1014 ], [ 1015, 1016 ], [ 1017, 1018 ], [ 1019, 1020 ], [ -1001, -1002 ], [ -1003, -1004 ], [ -1005, -1006 ], [ -1007, -1008 ], [ -1009, -1010 ], [ -1011, -1012 ], [ -1013, -1014 ], [ -1015, -1016 ], [ -1017, -1018 ], [ -1019, -1020 ], [ 99910, 99911 ], [ 99911, 99912 ], [ 99912, 99913 ], [ 99913, 99914 ], [ 99914, 99915 ], [ 99915, 99916 ], [ 99916, 99917 ], [ 99917, 99918 ], [ 99918, 99919 ], [ 99919, 99920 ], [ -99910, -99911 ], [ -99911, -99912 ], [ -99912, -99913 ], [ -99913, -99914 ], [ -99914, -99915 ], [ -99915, -99916 ], [ -99916, -99917 ], [ -99917, -99918 ], [ -99918, -99919 ], [ -99919, -99920 ], [ 1099910, 1099911 ], [ 1099911, 1099912 ], [ 1099912, 1099913 ], [ 1099913, 1099914 ], [ 1099914, 1099915 ], [ 1099915, 1099916 ], [ 1099916, 1099917 ], [ 1099917, 1099918 ], [ 1099918, 1099919 ], [ 1099919, 1099920 ], [ -1099910, -1099911 ], [ -1099911, -1099912 ], [ -1099912, -1099913 ], [ -1099913, -1099914 ], [ -1099914, -1099915 ], [ -1099915, -1099916 ], [ -1099916, -1099917 ], [ -1099917, -1099918 ], [ -1099918, -1099919 ], [ -1099919, -1099920 ], [ 91099910, 91099911 ], [ 91099911, 91099912 ], [ 91099912, 91099913 ], [ 91099913, 91099914 ], [ 91099914, 91099915 ], [ 91099915, 91099916 ], [ 91099916, 91099917 ], [ 91099917, 91099918 ], [ 91099918, 91099919 ], [ 91099919, 91099920 ], [ -91099910, -91099911 ], [ -91099911, -91099912 ], [ -91099912, -91099913 ], [ -91099913, -91099914 ], [ -91099914, -91099915 ], [ -91099915, -91099916 ], [ -91099916, -91099917 ], [ -91099917, -91099918 ], [ -91099918, -91099919 ], [ -91099919, -91099920 ], [ 491099910, 491099911 ], [ 491099911, 491099912 ], [ 491099912, 491099913 ], [ 491099913, 491099914 ], [ 491099914, 491099915 ], [ 491099915, 491099916 ], [ 491099916, 491099917 ], [ 491099917, 491099918 ], [ 491099918, 491099919 ], [ 491099919, 491099920 ], [ -491099910, -491099911 ], [ -491099911, -491099912 ], [ -491099912, -491099913 ], [ -491099913, -491099914 ], [ -491099914, -491099915 ], [ -491099915, -491099916 ], [ -491099916, -491099917 ], [ -491099917, -491099918 ], [ -491099918, -491099919 ], [ -491099919, -491099920 ], ); my $npoints = @points; my @points2 = @points; push @points2, [ $_->[0]+29, $_->[1]+45 ] for @points; push @points2, [ $_->[0]+450, $_->[1]+290 ] for @points; sub dbl_hash { my $cnt=0; my %cells; for my $p (@points) { $cells{$p->[0]}{$p->[1]}=undef; } for my $p (@points) { (exists $cells{$p->[0]} && exists $cells{$p->[0]}{$p->[1]}) and + ++$cnt; } exists $cells{'notfound'} and die; exists $cells{'notfound2'} and die; exists $cells{'notfound3'} and die; print "dbl: $cnt\n"; return \%cells; } sub str_hash { my $cnt=0; my %cells; for my $p (@points) { my $h = $p->[0] . ':' . $p->[1]; my ( $x, $y ) = split ':', $h; if ($x != $p->[0]) { die; } if ($y != $p->[1]) { die; } $cells{$h} = undef; } print "$npoints, keys ", scalar(keys %cells), "\n"; scalar(keys %cells) == $npoints or die; for my $p (@points) { my $h = $p->[0] . ':' . $p->[1]; exists $cells{$h} and ++$cnt; #or die; } exists $cells{'notfound'} and die; exists $cells{'notfound2'} and die; exists $cells{'notfound3'} and die; print "str: $cnt\n"; return \%cells; } sub stj_hash { my $cnt=0; my %cells; for my $p (@points) { my $h = join(':', @$p); my ( $x, $y ) = split ':', $h; if ($x != $p->[0]) { die; } if ($y != $p->[1]) { die; } $cells{$h} = undef; } print "$npoints, keys ", scalar(keys %cells), "\n"; scalar(keys %cells) == $npoints or die; for my $p (@points) { exists $cells{join(':',@$p)} and ++$cnt; } exists $cells{'notfound'} and die; exists $cells{'notfound2'} and die; exists $cells{'notfound3'} and die; print "stj: $cnt\n"; return \%cells; } sub big_hash { my $cnt=0; my %cells; for my $p (@points) { my $h = ($p->[1] << 32) | ($p->[0] & 0xFFFFFFFF); my $x = $h & 0x00000000FFFFFFFF; my $y = ($h & 0xFFFFFFFF00000000) >> 32; if ($x >> 31) { $x -= 0xFFFFFFFF + 1 } if ($y >> 31) { $y -= 0xFFFFFFFF + 1 } if ($x != $p->[0]) { die; } if ($y != $p->[1]) { die; } $cells{$h} = undef; } scalar(keys %cells) == $npoints or die; for my $p (@points) { my $h = ($p->[1] << 32) | ($p->[0] & 0xFFFFFFFF); exists $cells{$h} and ++$cnt; } exists $cells{'notfound'} and die; exists $cells{'notfound2'} and die; exists $cells{'notfound3'} and die; print "big: $cnt\n"; return \%cells; } sub pak_hash { my $cnt=0; my %cells; for my $p (@points) { my $h = pack "ii", $p->[0], $p->[1]; my ( $x, $y ) = unpack "ii", $h; if ($x != $p->[0]) { die; } if ($y != $p->[1]) { die; } $cells{$h} = undef; } scalar(keys %cells) == $npoints or die; for my $p (@points) { my $h = pack "ii", $p->[0], $p->[1]; exists $cells{$h} and ++$cnt; #or die; } exists $cells{'notfound'} and die; exists $cells{'notfound2'} and die; exists $cells{'notfound3'} and die; print "pak: $cnt\n"; return \%cells; } sub stj_look { my $cells = shift; my $chk = shift; my $cnt = 0; for my $p (@points2[0 .. $chk]) { exists $cells->{join(':', @$p)} and ++$cnt; } exists $cells->{'notfound'} and die; exists $cells->{'notfound2'} and die; exists $cells->{'notfound3'} and die; } sub str_look { my $cells = shift; my $chk = shift; my $cnt = 0; for my $p (@points2[0 .. $chk]) { exists $cells->{$p->[0] . ':' . $p->[1]} and ++$cnt; } exists $cells->{'notfound'} and die; exists $cells->{'notfound2'} and die; exists $cells->{'notfound3'} and die; } sub big_look { my $cells = shift; my $chk = shift; my $cnt = 0; for my $p (@points2[0 .. $chk]) { exists $cells->{($p->[1] << 32) | ($p->[0] & 0xFFFFFFFF)} and ++ +$cnt; } exists $cells->{'notfound'} and die; exists $cells->{'notfound2'} and die; exists $cells->{'notfound3'} and die; } sub pak_look { my $cells = shift; my $chk = shift; my $cnt = 0; for my $p (@points2[0 .. $chk]) { exists $cells->{pack "ii", $p->[0], $p->[1]} and ++$cnt; } exists $cells->{'notfound'} and die; exists $cells->{'notfound2'} and die; exists $cells->{'notfound3'} and die; } sub dbl_look { my $cells = shift; my $chk = shift; my $cnt = 0; for my $p (@points2[0 .. $chk]) { (exists $cells->{$p->[0]} and exists $cells->{$p->[0]}{$p->[1] +}) and ++$cnt; } exists $cells->{'notfound'} and die; exists $cells->{'notfound2'} and die; exists $cells->{'notfound3'} and die; } my $str_ref = str_hash(); my $stj_ref = stj_hash(); my $big_ref = big_hash(); my $pak_ref = pak_hash(); my $dbl_ref = dbl_hash(); timethese 200000, { StjA => sub { stj_look($stj_ref, $npoints) }, StrA => sub { str_look($str_ref, $npoints) }, BigA => sub { big_look($big_ref, $npoints) }, PakA => sub { pak_look($pak_ref, $npoints) }, DblA => sub { dbl_look($dbl_ref, $npoints) }, StjB => sub { stj_look($stj_ref, 2*$npoints) }, StrB => sub { str_look($str_ref, 2*$npoints) }, BigB => sub { big_look($big_ref, 2*$npoints) }, PakB => sub { pak_look($pak_ref, 2*$npoints) }, DblB => sub { dbl_look($dbl_ref, 2*$npoints) }, StjC => sub { stj_look($stj_ref, 3*$npoints) }, StrC => sub { str_look($str_ref, 3*$npoints) }, BigC => sub { big_look($big_ref, 3*$npoints) }, PakC => sub { pak_look($pak_ref, 3*$npoints) }, DblC => sub { dbl_look($dbl_ref, 3*$npoints) }, };

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re^3: Fastest way to lookup a point in a set
by kcott (Archbishop) on Aug 06, 2017 at 04:48 UTC

    That's a good point. Depending on usage and distribution, short-circuiting on "exists $cells[$x]" could also improve speed. In fact, knowing more about the distribution, organising the matrix as "$cells[$y][$x]", instead of "$cells[$x][$y]", could be another improvement.

    — Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1196802]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-04-24 03:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found