Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Re: Hashing urls with Adler32

by Tomte (Priest)
on May 31, 2007 at 14:40 UTC ( #618499=note: print w/replies, xml ) Need Help??

in reply to Hashing urls with Adler32

It produces an output of fixed length for an input of arbitrary length. So there is an infinit set of possible inputs mapped to a finit set of checksums - so the algorithm can't produce a unique checksum for every url you feed it. I suggest you read the article on wikipedia and then have a look at SHA1 as possibly the better solution - that nonetheless will not be bijective either (It will produce collisions!) - it depends on your problem at hand if this is a hindrance.


An intellectual is someone whose mind watches itself.
-- Albert Camus

Replies are listed 'Best First'.
Re^2: Hashing urls with Adler32
by isync (Hermit) on May 31, 2007 at 15:16 UTC
    Currently I am using MD5 as digest, but with lots of urls the data structure is growing big.

    So I thought about reducing the bits per url and using adler32 instead.

    BTW: I am implementing a url-seen structure here and need the hash to check against, while minimizing false positives/negatives.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://618499]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2017-11-22 03:39 GMT
Find Nodes?
    Voting Booth?
    In order to be able to say "I know Perl", you must have:

    Results (314 votes). Check out past polls.