Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^2: 32bit/64bit hash function: Use perls internal hash function?

by sectokia (Pilgrim)
on Apr 10, 2022 at 12:49 UTC ( [id://11142893] : note . print w/replies, xml ) Need Help??


in reply to Re: 32bit/64bit hash function: Use perls internal hash function?
in thread 32bit/64bit hash function: Use perls internal hash function?

Thanks! I am a complete noob when it comes to XS, need to really learn it.
  • Comment on Re^2: 32bit/64bit hash function: Use perls internal hash function?

Replies are listed 'Best First'.
Re^3: 32bit/64bit hash function: Use perls internal hash function?
by vr (Curate) on Apr 11, 2022 at 12:20 UTC
    use strict; use warnings; use feature 'say'; use B 'hash'; use Crypt::xxHash 'xxhash3_64bits'; use Digest::xxH64 'xx64'; use Benchmark 'cmpthese'; use Inline C => <<'_C_'; U32 myhash(SV* sv) { STRLEN len; U32 hash = 0; const char *s = SvPVbyte(sv, len); PERL_HASH(hash, s, len); return hash; } _C_ srand 1234; my $s = pack 'C*', map rand 256, 1 .. 64; cmpthese -2, { hash => sub { my $x = hash( $s )}, myhash => sub { my $x = myhash( $s )}, xxhash => sub { my $x = xxhash3_64bits( $s, 0 )}, xx64 => sub { my $x = xx64( $s )}, }; __END__ Rate hash myhash xxhash xx64 hash 1944302/s -- -52% -54% -84% myhash 4088577/s 110% -- -3% -66% xxhash 4233986/s 118% 4% -- -65% xx64 11994386/s 517% 193% 183% -- This is perl 5, version 32, subversion 1 (v5.32.1) built for MSWin32-x +64-multi-thread

    Try xxHash? The Digest::xxH64 is not on CPAN (but linked to from home i.e. officially 'endorsed'(?):)), Crypt::xxHash needs a fix to install in Windows, and Digest::xxHash (not in example above) is slower and therefore perhaps not of much interest in context of 'B::hash is too slow'.

    As already mentioned, the Judy::HS provides both hashing and sparse storage already built-in under-the-hood. So maybe manually-done hashing is not what you need. I have 'played' (i.e. not in serious 'production') with Judy (but not with Judy::HS) to store and access huge sparse data, and, yes, speed is comparable to Perl hashes with significantly less RAM appetites.

    Another option to consider: Math::GSL::SparseMatrix (and GSL being solid and renowned, etc.). As above, I 'played' with 64-bit-addressed sparse single-row (or was it single-column?) vector. Slower than Judy, yet installs without hassle in Windows, theoretically can address 128-bit sparse space (because of 2D) and can store data shorter than 64-bit integers i.e. needs even less RAM in that case.