note
eyepopslikeamosquito
<P>
Yes, I believe you are correct.
</P>
<P>
From <a href="http://svn.python.org/projects/python/trunk/Objects/stringobject.c">stringobject.c</a>:
<CODE>
static long
string_hash(PyStringObject *a)
{
register Py_ssize_t len;
register unsigned char *p;
register long x;
if (a->ob_shash != -1)
return a->ob_shash;
len = Py_SIZE(a);
p = (unsigned char *) a->ob_sval;
x = *p << 7;
while (--len >= 0)
x = (1000003*x) ^ *p++;
x ^= Py_SIZE(a);
if (x == -1)
x = -2;
a->ob_shash = x;
return x;
}
</CODE>
we can see that it is not whether the platform itself is 64-bit that matters,
but whether the <C>long</C> type
used by the C compiler that Python was built with is 64-bit.
For Python built with a 32-bit <C>long</C> my solution should work, for a 64-bit <C>long</C> it will not.
</P>
<P>
On 64-bit architectures, Windows C compilers tend to use the LLP64 programming model (32-bit long),
while most others tend to use the LP64 model (64-bit long).
From this <a href="https://stackoverflow.com/questions/9689049/what-decides-the-sizeof-an-integer">stack overflow question</a>:
<blockquote>
The true "war" was for sizeof(long), where Microsoft decided
for sizeof(long) == 4 (LLP64) while nearly everyone else decided for sizeof(long) == 8 (LP64).
Note that a programming model is a choice made on a per-compiler basis,
and several can coexist on the same OS. However, the programming model
chosen as the primary model for the OS API typically dominates.
</blockquote>
</P>
<P>
Hmmm, I see from this later <a href="http://hg.python.org/cpython/file/33a39dfc239e/Objects/stringobject.c">stringobject.c</a>
that <C>_Py_HashSecret_*</C> has been added, presumably to protect against DoS attacks that exploit hash collisions in Python dictionaries.
<CODE>
static long
string_hash(PyStringObject *a)
{
register Py_ssize_t len;
register unsigned char *p;
register long x;
#ifdef Py_DEBUG
assert(_Py_HashSecret_Initialized);
#endif
if (a->ob_shash != -1)
return a->ob_shash;
len = Py_SIZE(a);
/*
We make the hash of the empty string be 0, rather than using
(prefix ^ suffix), since this slightly obfuscates the hash secret
*/
if (len == 0) {
a->ob_shash = 0;
return 0;
}
p = (unsigned char *) a->ob_sval;
x = _Py_HashSecret.prefix;
x ^= *p << 7;
while (--len >= 0)
x = (1000003*x) ^ *p++;
x ^= Py_SIZE(a);
x ^= _Py_HashSecret.suffix;
if (x == -1)
x = -2;
a->ob_shash = x;
return x;
}
</CODE>
</P>
<P>
See also:
<ul>
<li> <a href="https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models">64-bit Data Models</a> (wikipedia)
<li> <a href="https://unix.org/version2/whatsnew/lp64_wp.html">64-bit Programming Models: Why LP64?</a> (unix.org)
<li> <a href="https://stackoverflow.com/questions/9689049/what-decides-the-sizeof-an-integer">What decides the sizeof an integer?</a> (SO)
</ul>
</P>
<P>
<ul>
<li> <a href="https://mail.python.org/pipermail/python-dev/2011-December/115116.html">Python Dev Mailing List: Hash collision security issue</a>
<li> <a href="https://bugs.python.org/issue13703">Python Bug Tracker Issue 13703: Hash collision security issue</a>
</ul>
</P>
<P>
<ul>
<li> <a href="https://rt.perl.org/Public/Bug/Display.html?id=22371">Perl bug report #22371 (Algorithmic Complexity Attack)</a> (2003)
<li> <a href="https://blog.booking.com/hardening-perls-hash-function.html">Hardening Perl's hash function</a> (2013)
<li> <a href="https://www.nntp.perl.org/group/perl.perl5.porters/2013/03/msg199755.html">P5P thread: CVE-2013-1667: important rehashing flaw</a> (2013)
<li> <a href="https://131002.net/siphash/">SipHash: a fast short-input PRF</a>
</ul>
</P>
<P>
<ul>
<li> [id://11135514]
</ul>
</P>
1086650
1086727