Re: Fast string hash in portable perl?
by duff (Parson) on Dec 19, 2003 at 19:25 UTC
|
There's the B::hash() routine which returns a string representation of the hexadecimal representation of the number generated by perl's hashing function. perldoc B Is that good enough?
| [reply] [d/l] |
|
I think this will be the fastest just because it is actually using compiled code in the perl bin to perform the hash. The only issue is that perls hash keygen does allow collisions and they are going to be much more frequent than lets say an md5 generator. it all depends what you are using it for.
| [reply] |
|
This is exactly what i was looking for. I need to spend more time in the perldoc B. :-) I'll futz with it a bit to make sure it has few enough collisions, but I expect it will do nicely.
Thank you kindly!
j
| [reply] |
Re: Fast string hash in portable perl? [DO NOT USE!]
by BrowserUk (Patriarch) on Dec 19, 2003 at 19:41 UTC
|
Update: 14 years after posting huck shows that this code is broken!
You'd best do your own testing of this before you consider using it, to convince yourself it is a faithful implementation of the original.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!
| [reply] [d/l] |
|
| [reply] |
|
| [reply] |
|
We have this hashing function (Jenkins Hash) as an XS extension, yes you have to compile it (we have compiled on Win32 and Linux without issues) but the result is of course much faster. You can get a copy from http://tachyon.perlmonk.org/Digest-JHash-0.02.tar.gz
We did a fair bit of dispersion testing on it and found it to be pretty damn good using URLs as the input data which is what we wanted it for. The dispersion peaks at somewhere around 90%+ of slots from memory.
Maybe we should post Digest::JHash and Digest::JHahs::PurePerl to CPAN?. Here it is as XS (Digest::JHash.XS):
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
/* Jenkins Hash http://burtleburtle.net/bob/hash/doobs.html */
const int DEBUG = 0;
#define MIX(a,b,c) \
{ \
a -= b; a -= c; a ^= (c>>13); \
b -= c; b -= a; b ^= (a<<8); \
c -= a; c -= b; c ^= (b>>13); \
a -= b; a -= c; a ^= (c>>12); \
b -= c; b -= a; b ^= (a<<16); \
c -= a; c -= b; c ^= (b>>5); \
a -= b; a -= c; a ^= (c>>3); \
b -= c; b -= a; b ^= (a<<10); \
c -= a; c -= b; c ^= (b>>15); \
}
unsigned long jhash( SV* str )
{
STRLEN rawlen;
char* p;
unsigned long a, b, c, len, length;
/* extract the string data and string length from the perl scalar
+*/
p = (char*)SvPV(str, rawlen);
length = len = (unsigned long)rawlen;
/* Test for undef or null string case and return 0 */
if ( length == 0 ) {
DEBUG && printf( "Recieved a null or undef string!\n" );
return 0;
}
DEBUG && printf( "Received string '%.*s'.\n", (int)len, p );
a = b = 0x9e3779b9; /* golden ratio suggested by Jenkins */
c = 0;
while (len >= 12)
{
a += (p[0] + (((unsigned long)p[1])<<8) + (((unsigned long)p[2
+])<<16) +
(((unsigned long)p[3])<<24));
b += (p[4] + (((unsigned long)p[5])<<8) + (((unsigned long)p[6
+])<<16) +
(((unsigned long)p[7])<<24));
c += (p[8] + (((unsigned long)p[9])<<8) + (((unsigned long)p[1
+0])<<16) +
(((unsigned long)p[11])<<24));
MIX(a, b, c);
p += 12;
len -= 12;
}
c += length;
switch(len) {
case 11: c+=((unsigned int)p[10]<<24);
case 10: c+=((unsigned int)p[9]<<16);
case 9: c+=((unsigned int)p[8]<<8);
case 8: b+=((unsigned int)p[7]<<24);
case 7: b+=((unsigned int)p[6]<<16);
case 6: b+=((unsigned int)p[5]<<8);
case 5: b+=((unsigned int)p[4]);
case 4: a+=((unsigned int)p[3]<<24);
case 3: a+=((unsigned int)p[2]<<16);
case 2: a+=((unsigned int)p[1]<<8);
case 1: a+=((unsigned int)p[0]);
}
MIX(a, b, c);
DEBUG && printf( "Hash value is %d.\n", (int)(c) );
return(c);
}
MODULE = Digest::JHash PACKAGE = Digest::JHash
unsigned long
jhash(str)
SV* str
# Digest::JHash.pm
package Digest::JHash;
use strict;
require Exporter;
require DynaLoader;
our @ISA = qw(Exporter DynaLoader);
our @EXPORT_OK = qw( jhash );
our $VERSION = '0.02';
bootstrap Digest::JHash $VERSION;
1;
=head1 NAME
Digest::JHash - Perl extension for JHash Hashing Algoritm
=head1 SYNOPSIS
use Digest::JHash qw(jhash);
$digest = jhash($data);
# note that calling jhash() directly like this is the fastest way:
$digest = Digest::JHash::jhash($data);
=head1 DESCRIPTION
The C<Digest::JHash> module allows you to use the fast JHash hashing a
+lgorithm
developed by Bob Jenkins from within Perl programs. The algorithm take
+s as
input a message of arbitrary length and produces as output a 32-bit
"message digest" of the input in the form of an unsigned long integer.
Call it a low calorie version of MD5 if you like.
See http://burtleburtle.net/bob/hash/doobs.html for more information.
=head1 FUNCTIONS
=over 4
=item jhash($data)
This function will calculate the JHash digest of the "message" in $dat
+a
and return a 32 bit integer result (an unsigned long in the C)
=back
=head1 EXPORTS
None by default but you can have the jhash() function if you ask nicel
+y.
See below for reasons not to use Exporter (it is slower than a direct
+call)
=head1 SPEED NOTE
If speed is a major issue it is roughly twice as fast to do call the j
+hash()
function like Digest::JHash::jhash('message') than it is to import the
jhash() method using Exporter so you can call it as simply jhash('mess
+age').
There is a short script that demonstrates the speed of differecnt call
+ing
methods (direct, OO and Imported) in misc/oo_vs_func.pl
=head1 AUTHORS
The JHash implementation was written by Bob Jenkins
(C<bob_jenkins@burtleburtle.net>).
This perl extension was written by Andrew Towers
(C<mariofrog@bigpond.com>).
A few mods were added by James Freeman
(C<james.freeman@id3.org.uk>).
=head1 SEE ALSO
http://burtleburtle.net/bob/hash/doobs.html
=cut
| [reply] [d/l] |
|
Thanks tachyon, that's great.
I did compile the original C code as a C program and use it for comparisons with the perl version when I was working on this, though that was some time ago. The C was obviously much faster.
Unfortunately, despite my best efforts, I still live in the "nothing I must compile myself" perl world.
I do have a version of 5.8.1 that I compiled with Borland that works, but every time I've tried to use modules that require compilation in conjunction with that build, I spend hours or days trying to hack the build into working. I'm ashamed to say that I gave up and went back to being a AS dependant.
Thanks to the number of kind folks around that make binary versions of those modules that AS haven't gotten around to yet, I rarely encounter a module that I cannot install and use in short order, given the appropriate googling.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!
| [reply] |
|
|
Re: Fast string hash in portable perl?
by Corion (Patriarch) on Dec 19, 2003 at 19:25 UTC
|
I believe if you're hell-bent on pure Perl, there is no faster thing than the CRC32 for "sufficiently good" hashing. Enough work has gone into CRC32 to make it fast, as it has been in use for a long time. Oh - there even is a module for it - CRC32
But why not simply use Digest::MD5 ? I think it's core nowadays, and there even is a (slow) pure Perl version available ...
Update:Added link to the CRC32 module
perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The
$d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider
($c = $d->accept())->get_request(); $c->send_response( new #in the
HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
| [reply] [d/l] |
|
PurePerl generally means things that don't require a a C compiler to install, so that Win32 can easily use it. CRC32 has an XS component.
------
We are the carpenters and bricklayers of the Information Age.
Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.
| [reply] |
|
| [reply] |
|
|
| [reply] |
|
Not 32 bit? Then just run the B::hash on the MD5! Just kidding folks, just kidding!
You are exactly right on CRC32. I can't think of any case where CRC32 is really a good choice.
| [reply] |
|