Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Hashing urls with Adler32

by Tomte (Priest)
on May 31, 2007 at 14:40 UTC ( [id://618499]=note: print w/replies, xml ) Need Help??


in reply to Hashing urls with Adler32

It produces an output of fixed length for an input of arbitrary length. So there is an infinit set of possible inputs mapped to a finit set of checksums - so the algorithm can't produce a unique checksum for every url you feed it. I suggest you read the article on wikipedia and then have a look at SHA1 as possibly the better solution - that nonetheless will not be bijective either (It will produce collisions!) - it depends on your problem at hand if this is a hindrance.

regards,
tomte


An intellectual is someone whose mind watches itself.
-- Albert Camus

Replies are listed 'Best First'.
Re^2: Hashing urls with Adler32
by isync (Hermit) on May 31, 2007 at 15:16 UTC
    Currently I am using MD5 as digest, but with lots of urls the data structure is growing big.

    So I thought about reducing the bits per url and using adler32 instead.

    BTW: I am implementing a url-seen structure here and need the hash to check against, while minimizing false positives/negatives.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://618499]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (3)
As of 2024-04-25 06:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found