in reply to Hashing urls with Adler32

It produces an output of fixed length for an input of arbitrary length. So there is an infinit set of possible inputs mapped to a finit set of checksums - so the algorithm can't produce a unique checksum for every url you feed it. I suggest you read the article on wikipedia and then have a look at SHA1 as possibly the better solution - that nonetheless will not be bijective either (It will produce collisions!) - it depends on your problem at hand if this is a hindrance.


An intellectual is someone whose mind watches itself.
-- Albert Camus

Replies are listed 'Best First'.
Re^2: Hashing urls with Adler32
by isync (Hermit) on May 31, 2007 at 15:16 UTC
    Currently I am using MD5 as digest, but with lots of urls the data structure is growing big.

    So I thought about reducing the bits per url and using adler32 instead.

    BTW: I am implementing a url-seen structure here and need the hash to check against, while minimizing false positives/negatives.