Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^3: MD5 - what's the alternative

by danderson (Beadle)
on Aug 27, 2004 at 21:15 UTC ( #386495=note: print w/replies, xml ) Need Help??


in reply to Re^2: MD5 - what's the alternative
in thread MD5 - what's the alternative

"In fact, an infinite number of collisions!"
...
"Not trying to split hairs, but I would say a finite, large number but not infinite."

Sigh.

md5 is a reasonably good hash. As it's input # grows, even as it approaches infinity, there are no numbers in it's range (output space) that cease to be 'hit.' So, theoretically, you can feed it an infinite number of consecutive inputs, and some subset of them they will give you an infinite number of collisions on any given point on the output space.

But we're not talking about math in theory, we're talking about math in the real world. There are limits, based on speed of computation, memory size, disk size, etc. Based on these, there is a finite (though very large) number of possible md5 sums calculable in any given timeframe - even if that timeframe is "from the advent of the abacus to the heat death of the universe, when there's no entropy generatable and no work can be done."

Less facetiously, I'd say that the difficulty of computing md5 sums from, say, >1 Terabyte inputs means that there will be a very low number of collisions from inputs that high. Why bother, when you can get a collision from under-quadruple-digit bytes?

So, the answer is really 'both.' In theory, there's an infinite number of collisions for any md5 output. In practice, there certainly isn't, and the number of collisions that will be generated in our lifetimes is finite to the point of being understandable, and maybe even visualized, by our little human brains.

edit: more importantly, "Therefore the number of inputs that will map to any given md5 is infinity / 2**128" is incorrect. You're assuming even distribution from the domain to the range. This is not proven (otherwise given any consecutive set of (2**128)-1 elements, they'd cover the range of md5 minus one, and adding one more would cover the range entirely. Not yet proven to be true, and in fact quite unlikely). So division doesn't follow, thus while your conclusion is correct your path to get there isn't.

Replies are listed 'Best First'.
Re^4: MD5 - what's the alternative
by BrowserUk (Pope) on Aug 30, 2004 at 10:37 UTC
    edit: more importantly, "Therefore the number of inputs that will map to any given md5 is infinity / 2**128" is incorrect. You're assuming even distribution from the domain to the range. This is not proven (otherwise given any consecutive set of (2**128)-1 elements, they'd cover the range of md5 minus one, and adding one more would cover the range entirely. Not yet proven to be true, and in fact quite unlikely). So division doesn't follow, thus while your conclusion is correct your path to get there isn't.

    I just noticed your edit.

    I agree, that there is an implied (or should be) "on average" inserted into that statement.

    However, even if the mapping of inputs to outputs is a less than even distribution, and the value "2**128" is less (it cannot be more), you still end up with a sum of infinity /  somenumber, with the result that (on average) the number of (possible) messages that (would) map to any given md5, is infinity.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://386495]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (12)
As of 2019-10-17 13:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?