Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks,
I'm trying to replicate a java hashing algorithm for a string in a Perl script. The way Java calculates the hash value is:
public int hashCode()
Returns a hashcode for this string. The hashcode for a String object is computed as
s[0]*31^(n-1)+s[1]*31^(n-2)+ ... +s[n-1]
using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)
Can anyone help me with this.
Thanks
Re: Generating A Hash Value
by Chady (Priest) on Jul 03, 2003 at 11:30 UTC
|
# assuming that...
my $string = "Perl Monks";
my $n = length($string);
my $i = 1;
my $result = 0;
# here it is.
$result += ord($_) * 31^($n-$i++) for split '', $string;
^ ^ ^ ^ ^ ^ ^ ^
| | | | | | | +-- String to split
| | | | | | +-- split all the charact
+ers
| | | | | +-- post increment $i
| | | | +-- $i is the index
| | | +--- $n is the length of $string
| | +--- this is the character returned from split
| +--- ord() returns the ascii code of the character
+--- add what we have to $result
He who asks will be a fool for five minutes, but he who doesn't ask will remain a fool for life.
Chady | http://chady.net/ | [reply] [Watch: Dir/Any] [d/l] |
Re: Generating A Hash Value
by BrowserUk (Patriarch) on Jul 03, 2003 at 11:34 UTC
|
When you say "using int arithmetic"; signed or unsigned? 32-bit?
Could you supply the hash value generated for
"The quick brown fox jumps over the lazy dog" so that we may have something to check our attempts against.
This might be close, and reasonable efficient as it avoids the exponentiation by processing the string backwards and accumulating the exponent, and uses integer math. However truncating to 32-bits in reverse order maybe screwing up the result?
#! perl -slw
use strict;
# s[0]*31^(n-1)+s[1]*31^(n-2)+ ... +s[n-1]
sub JavaHash {
use integer;
my ($s) = @_;
my $n = length $s;
my @s = unpack 'C*', $s;
my $hash = 0;
my $p = 1;
for ( reverse @s ) {
print "$hash : $_: $p";
$hash += $_ * $p;
$p *= 31;
}
return $hash;
}
die 'No arguments' unless @ARGV;
printf '%10d:%s' , JavaHash( $_ ), $_ for @ARGV;
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
| [reply] [Watch: Dir/Any] [d/l] |
Re: Generating A Hash Value
by Tomte (Priest) on Jul 03, 2003 at 11:31 UTC
|
What exactly is your problem? you happen to have the algorithm, all you have to do is write it down in perl ;)
Here is a rather clumsy solution, since you don't provide a test-sample, I don't know if the results are correct:
use strict;
use warnings;
my @strings = ('abzdefghijk',
'a b z d e f g h i j k',
'',
'abzde fghijk',);
print $_, " => ", hc($_), "\n" for @strings;
sub hc {
my $string = shift;
my $erg = 0;
my $max = int(length($string) - 1);
# this line is your algorithm!
$erg += (ord(substr($string,$_,1)))*31**($max-$_) for 0..$max;
$erg;
}
I'd like a way to get rid of the substr ;)
regards,
tomte
Hlade's Law:
If you have a difficult task, give it to a lazy person --
they will find an easier way to do it.
| [reply] [Watch: Dir/Any] [d/l] |
Re: Generating A Hash Value
by bart (Canon) on Jul 03, 2003 at 12:54 UTC
|
It looks like what you need to do is: get the last value you had, multiply it by 31, and add the next number; repeat. In Perl that would become:
sub hashing {
my $s = shift;
my $acc = 0;
foreach my $c (unpack 'C*', $s) {
$acc *= 31;
$acc += $c;
}
return $acc;
}
You will have overflow problems, I think. Likely you should only retain the $n (<=32?) lower bits of $acc. Your remark on "using int arithmetic" also points in that direction. So I'll modify that into:
sub hashing {
my $s = shift;
my $acc = 0;
foreach my $c (unpack 'C*', $s) {
$acc *= 31;
$acc += $c;
$acc %= 0x100000000;
}
return $acc;
}
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Generating A Hash Value
by thor (Priest) on Jul 03, 2003 at 15:49 UTC
|
Though your question has been answered, I have to wonder if this is really Java's hashing algorighm. It seems as though having an arbitrarily long hash key would be unwieldy. Though, if the algorithm is really that simple, I suppose it'd be fast...
thor | [reply] [Watch: Dir/Any] |
|
Two things:
- You don't have to wonder, you can read it for yourself.
- Seondly, and more importantly, the way the hashCode() method works in java allows for objects of any type to serve as hash keys. The original poster asked a question about the implementation of that method for Strings. (also: Perl allows for arbitrarily long hash keys, too, since any strings are hash keys, and strings are arbitrarily long)
- Three things: there is no third thing.
| [reply] [Watch: Dir/Any] |
|
To address your items, point by point:
- I could have, but I'm lazy. And laziness is a virtue.
- I understand what purpose hashCode() has. However, in perl, hash codes are not of arbitrary length. There is an algorithm described here that describes an algorithm that takes an arbitrary input and produces a 32-bit hash code. A bit slower, I suppose, but keeps you from having to keep track of an arbitrarily large integer for your hash key. So, it's not necessarily the case that one use arbitrarily large hash codes (which is different from arbitrarily large hash keys)
- Good, because I don't either...:).
thor
| [reply] [Watch: Dir/Any] |
|
|