Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Re: Eek! goto?

by BrowserUk (Patriarch)
on Feb 12, 2003 at 15:43 UTC ( [id://234705]=note: print w/replies, xml ) Need Help??


in reply to Re: Eek! goto?
in thread Eek! goto?

If it were not for pfaut's unpack suggestion, I almsot certainly would be using it, but the loss of performance of the perl implimentation relative to the C switch version means pushing as much of the work into perl as I can has dividends.

The original C code is here.

My current best perl implementation of the while thing is the hash2() in the test code below. hash() being my original brute force conversion. The two version appear to function the same as each other, with the unpack version coming out about %50 faster.

However, the number of first pass collisions worries me a little. It could be just be a function of the limited range of inputs, but its higher than I expected. The other possibility is that both implementations are equally wrong. If you (or anyone) has a few moments to compare the perl and the C versions and point out any obvious differences I'd be grateful.

Also, if anyone has any idea's or pointers for methods of testing the goodness of this type of hashing function I'd really like to hear/see them. Unfortunately, the vast majority of the hits on google relate to hashing fuctions used for cryptography which is a completly different ballpark. I'm slowly whittling them down and wading through them, but haven't found anything applicable yet.

My test code

#! perl -slw use strict; use constant GOLDEN_RATIO => 0x9e3779b9; use constant A => 0; use constant B => 1; use constant C => 2; sub mix ($$$) { $_[A] -= $_[B]; $_[A] -= $_[C]; $_[A] ^= ($_[C]>>13); $_[B] -= $_[C]; $_[B] -= $_[A]; $_[B] ^= ($_[A]<< 8); $_[C] -= $_[A]; $_[C] -= $_[B]; $_[C] ^= ($_[B]>>13); $_[A] -= $_[B]; $_[A] -= $_[C]; $_[A] ^= ($_[C]>>12); $_[B] -= $_[C]; $_[B] -= $_[A]; $_[B] ^= ($_[A]<<16); $_[C] -= $_[A]; $_[C] -= $_[B]; $_[C] ^= ($_[B]>> 5); $_[A] -= $_[B]; $_[A] -= $_[C]; $_[A] ^= ($_[C]>> 3); $_[B] -= $_[C]; $_[B] -= $_[A]; $_[B] ^= ($_[A]<<10); $_[C] -= $_[A]; $_[C] -= $_[B]; $_[C] ^= ($_[B]>>15); } sub hash { my ($key, $init) = @_; my @k = unpack 'C*', $key; my $len = length $key; my ($a, $b, $c) = (GOLDEN_RATIO, GOLDEN_RATIO, $init); while($len >=12) { $a += ($k[0] + ($k[1]<<8) + ($k[2]<<16) + ($k[3]<24)); $b += ($k[4] + ($k[5]<<8) + ($k[6]<<16) + ($k[7]<24)); $a += ($k[8] + ($k[9]<<8) + ($k[10]<<16) + ($k[11]<24)); mix($a,$b,$c); splice(@k, 0, 11); $len -= 12; } $c += length $key; goto "LAST".$len; LAST11: $c+= $k[10] <<24; LAST10: $c+= $k[9] <<16; LAST9: $c+= $k[8] <<8; LAST8: $b+= $k[7] <<24; LAST7: $b+= $k[6] <<16; LAST6: $b+= $k[5] <<8; LAST5: $b+= $k[4]; LAST4: $a+= $k[3] <<24; LAST3: $a+= $k[2] <<16; LAST2: $a+= $k[1] <<8; LAST1: $a+= $k[0]; LAST0: mix($a,$b,$c); return $c; } sub hash2 { my ($key,$init) = @_; my($a,$b,$c) = (GOLDEN_RATIO, GOLDEN_RATIO, $init); my ($p, $len) = (0, length $key); do { my($x,$y,$z) = unpack 'V3', substr($key, $p, 12); mix($a+=($x||0), $b+=($y||0), $c+=($z||$len)); $p+=12; } while $p <= $len; return $c; } my %freq; my $n=0; $n++, push @{$freq{hash( $_, 0)}}, $_ for 'AAAA' .. 'ZZZZ'; print $n, ':', scalar keys %freq; undef(%freq); $n=0; $n++, push @{$freq{hash2( $_, 0)}}, $_ for 'AAAA' .. 'ZZZZ'; print $n, ':', scalar keys %freq;

Benchmark results

c:\test>hash 456976 values resulted in 233710 unique hashes & a maximum of 34 first + pass collisions 456976 values resulted in 233710 unique hashes & a maximum of 34 first + pass collisions 1 trial of hash (195.751s total) 1 trial of hash2 (133.032s total)

Examine what is said, not who speaks.

The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Replies are listed 'Best First'.
Re: Eek! goto?
by Abigail-II (Bishop) on Feb 12, 2003 at 16:09 UTC
    Since you already have the C code, it shouldn't be too hard to write an Inline::C version. Even if for some reason you don't want to end up with an Inline::C version for production or distribution, you could use the Inline::C function for testing - both the C and the Perl function should return the same values.

    And it would also be interesting to benchmark the two.

    Abigail

      That would be ideal. Unfortunately I am still hamstrung trying to build my own version of Perl using MingW and don't have the money or inclination to shell out for MSVC++ (which is what it appears AS use to build their distributions).

      The problem I have hit is

      Info: resolving __sys_nerr by linking to __imp___sys_nerr (auto-import +) fu000001.o(.idata$3+0xc): undefined reference to `libmsvcrt_a_iname' nmth000000.o(.idata$4+0x0): undefined reference to `_nm___sys_nerr' dmake.exe: Error code 1, while making '..\miniperl.exe'

      Which from google appears to be a problem with the MinGW libraries rather than the perl source as I found evidence of similar problems when it's used to build various other things including CygWin, Ruby and Python. I've found nothing that relates to this problem when building perl, but whilst the docs suggest building Perl for Win32 using MinGW and DMAKE is possible, I've found little real information on the subject so far, and my requests here for assistance have been met with a stoney silence.

      It seems noone else has succeeded doing this either...?


      Examine what is said, not who speaks.

      The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Re: Re: Re: Eek! goto?
by pfaut (Priest) on Feb 12, 2003 at 15:58 UTC

    Looks like a typo in hash(). Should the third line inside the while loop be $c = ...?

    --- print map { my ($m)=1<<hex($_)&11?' ':''; $m.=substr('AHJPacehklnorstu',hex($_),1) } split //,'2fde0abe76c36c914586c';

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://234705]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2024-03-28 20:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found