Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Hashing urls with Adler32

by Tomte (Priest)
on May 31, 2007 at 14:40 UTC ( #618499=note: print w/ replies, xml ) Need Help??


in reply to Hashing urls with Adler32

It produces an output of fixed length for an input of arbitrary length. So there is an infinit set of possible inputs mapped to a finit set of checksums - so the algorithm can't produce a unique checksum for every url you feed it. I suggest you read the article on wikipedia and then have a look at SHA1 as possibly the better solution - that nonetheless will not be bijective either (It will produce collisions!) - it depends on your problem at hand if this is a hindrance.


regards,
tomte


An intellectual is someone whose mind watches itself.
-- Albert Camus


Comment on Re: Hashing urls with Adler32
Download Code
Re^2: Hashing urls with Adler32
by isync (Hermit) on May 31, 2007 at 15:16 UTC
    Currently I am using MD5 as digest, but with lots of urls the data structure is growing big.

    So I thought about reducing the bits per url and using adler32 instead.

    BTW: I am implementing a url-seen structure here and need the hash to check against, while minimizing false positives/negatives.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://618499]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2014-09-15 02:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (145 votes), past polls