Below is a pure perl version of the lookup2 hash by Bob Jenkins as talked about at A Hash Function for Hash Table Lookup.
Edit: This is an updated version, in particular the mix4 mentioned at Re: Pure perl Jenkins 32 bit Hash is now called mix4x and may be discarded. Also there is code to select the proper version based on $Config{ivsize} and a test call.
{ # has use integer/bytes
use integer;
use bytes;
# http://www.perlmonks.org/?node_id=315881
# http://burtleburtle.net/bob/c/lookup2.c
# http://burtleburtle.net/bob/hash/doobs.html
# http://search.cpan.org/~shlomif/Digest-JHash/lib/Digest/JHash.pm
# http://cpansearch.perl.org/src/SHLOMIF/Digest-JHash-0.10/JHash.xs */
+
use constant GOLDEN_RATIO => 0x9e3779b9;
use constant A => 0;
use constant B => 1;
use constant C => 2;
use constant FFFFFFFF => 0xffffffff;
use constant KEY => 0;
use constant INITHASH => 1;
sub mix4 ($$$) {
# 32bit version
# per http://www.perlmonks.org/?node_id=1203705 this is a revised 32bi
+t under 'use integer';
$_[A] -= $_[B]; $_[A] -= $_[C]; { no integer; $_[A] ^= ($_[C]>>13)
+; }
$_[B] -= $_[C]; $_[B] -= $_[A]; { no integer; $_[B] ^= ($_[A]<< 8)
+; }
$_[C] -= $_[A]; $_[C] -= $_[B]; { no integer; $_[C] ^= ($_[B]>>13)
+; }
$_[A] -= $_[B]; $_[A] -= $_[C]; { no integer; $_[A] ^= ($_[C]>>12)
+; }
$_[B] -= $_[C]; $_[B] -= $_[A]; { no integer; $_[B] ^= ($_[A]<<16)
+; }
$_[C] -= $_[A]; $_[C] -= $_[B]; { no integer; $_[C] ^= ($_[B]>> 5)
+; }
$_[A] -= $_[B]; $_[A] -= $_[C]; { no integer; $_[A] ^= ($_[C]>> 3)
+; }
$_[B] -= $_[C]; $_[B] -= $_[A]; { no integer; $_[B] ^= ($_[A]<<10)
+; }
$_[C] -= $_[A]; $_[C] -= $_[B]; { no integer; $_[C] ^= ($_[B]>>15)
+; }
}
sub mix4x ($$$) {
# per http://www.perlmonks.org/?node_id=1203705 this is wrong
$_[A] -= $_[B]; $_[A] -= $_[C]; $_[A] ^= ($_[C]>>13);
$_[B] -= $_[C]; $_[B] -= $_[A]; $_[B] ^= ($_[A]<< 8);
$_[C] -= $_[A]; $_[C] -= $_[B]; $_[C] ^= ($_[B]>>13);
$_[A] -= $_[B]; $_[A] -= $_[C]; $_[A] ^= ($_[C]>>12);
$_[B] -= $_[C]; $_[B] -= $_[A]; $_[B] ^= ($_[A]<<16);
$_[C] -= $_[A]; $_[C] -= $_[B]; $_[C] ^= ($_[B]>> 5);
$_[A] -= $_[B]; $_[A] -= $_[C]; $_[A] ^= ($_[C]>> 3);
$_[B] -= $_[C]; $_[B] -= $_[A]; $_[B] ^= ($_[A]<<10);
$_[C] -= $_[A]; $_[C] -= $_[B]; $_[C] ^= ($_[B]>>15);
}
sub mix8 ($$$) {
# 64bit version
$_[A] &= FFFFFFFF;
$_[B] &= FFFFFFFF;
$_[C] &= FFFFFFFF;
$_[A] -= $_[B]; $_[A] -= $_[C]; $_[A] = ( $_[A] ^ ($_[C]>>13) ) &
+ FFFFFFFF;
$_[B] -= $_[C]; $_[B] -= $_[A]; $_[B] = ( $_[B] ^ ($_[A]<< 8) ) &
+ FFFFFFFF;
$_[C] -= $_[A]; $_[C] -= $_[B]; $_[C] = ( $_[C] ^ ($_[B]>>13) ) &
+ FFFFFFFF;
$_[A] -= $_[B]; $_[A] -= $_[C]; $_[A] = ( $_[A] ^ ($_[C]>>12) ) &
+ FFFFFFFF;
$_[B] -= $_[C]; $_[B] -= $_[A]; $_[B] = ( $_[B] ^ ($_[A]<<16) ) &
+ FFFFFFFF;
$_[C] -= $_[A]; $_[C] -= $_[B]; $_[C] = ( $_[C] ^ ($_[B]>> 5) ) &
+ FFFFFFFF;
$_[A] -= $_[B]; $_[A] -= $_[C]; $_[A] = ( $_[A] ^ ($_[C]>> 3) ) &
+ FFFFFFFF;
$_[B] -= $_[C]; $_[B] -= $_[A]; $_[B] = ( $_[B] ^ ($_[A]<<10) ) &
+ FFFFFFFF;
$_[C] -= $_[A]; $_[C] -= $_[B]; $_[C] = ( $_[C] ^ ($_[B]>>15) ) &
+ FFFFFFFF;
}
sub jhash_pp_hex {
my ($a, $b, $c) = ( GOLDEN_RATIO, GOLDEN_RATIO, $_[INITHASH] );
my ($p, $length) = (0, length $_[KEY]);
my $len=$length;
my($x,$y,$z);
while ($len>=12) {
($x,$y,$z) = unpack 'LLL', substr($_[KEY], $p, 12);
$a+=$x;$b+=$y;$c+=$z;
mix($a, $b, $c);
$p += 12;
$len-=12;
}
# even if len==0 we need another round to mix in the length
($x,$y,$z) = unpack 'LLL', substr($_[KEY] . (chr(0)x12), $p, 1
+2);
$z<<=8; # /* the first byte of c is reserved for the length *
+/
$z+=$length;
$a+=$x;$b+=$y;$c+=$z;
mix($a, $b, $c);
my $hex = unpack("H*", pack("N", $c));
return $hex;
} # jhash_pp_hex
use Config;
if ( $Config{ivsize} == 4 ) { *main::mix=*main::mix4; }
else { *main::mix=*main::mix8; }
} # has use integer/bytes
print jhash_pp_hex('abcdef',0)."\n";
I had a situation where i could not use Digest::JHash
because i did not have access to a compiler.
I needed to hash filenames for a filetracking program using a mysql database.
Rather than carry the filename in 4+ tables i use a unique fnid assigned in a fnid table that holds the filename in a text bucket.
As you cannot really index a text field i was originally using md5 on the filename as a key. I knew there can be collisions but that to select a single filename SELECT fnid FROM fnid WHERE fnmd5=? and fn=? would do ok if fnmd5 was a index.
But md5 in hex is 32 chars, and that was real big, bigger than i needed.
So i searched for 32 bit hashs, found the jenkins variants
and found Re: Fast string hash in portable perl? [DO NOT USE!] by BrowserUk so i decided to try that.
My disappointment will be described in a separate reply thread about the history below [history] Pure perl Jenkins 32 bit Hash.
This also calls into question
what to do about Digest::JHash,
again i will use a separate reply thread [Digest::JHash problem] Pure perl Jenkins 32 bit Hash to talk about that.
Overall i am happy with this.
I realize there are newer jenkins and other 32bit hashs,
but this will do.
It reduced the phpmyadmin reported size of the fnid table by 50%.
It is fast enough for my needs, it seems to be the hash perl uses for its hashs.