http://www.perlmonks.org?node_id=386281


in reply to On showing the weakness in the MD5 digest function and getting bitten by scalar context

To find another string that maps to the same digest (i.e. a collision in the MD5-space) should theoretically take 2^128 steps.

Actually that is right and wrong. Hash collisions conform to the Birthday Paradox which says that if you have 23 people in a room there is a 50% probability that 2 people will share a birthday (have a collision). This 'paradox' applies to hashes as well. For example you have a 50% probability of a collision in a 32 bit hash after a mere 77163 tries (roughly 2^16 ie nothing like 2^32)

Update

Corrected technical inexactitude. The birthday paradox is the correct answer if you simply want a collision. To get a specific collision with a given digest it is not.

cheers

tachyon

  • Comment on Re: On showing the weakness in the MD5 digest function and getting bitten by scalar context

Replies are listed 'Best First'.
Re^2: On showing the weakness in the MD5 digest function and getting bitten by scalar context
by Velaki (Chaplain) on Aug 27, 2004 at 13:07 UTC

    Not a paradox, merely simple probability. However, it does assume that people are born with equal likelihood throughout the year. But its solution will lead to the thought of a "paradox" in that the number of people that must be in the room seems less than what you'd expect. The trick here is whether you're looking for any two birthdays in the room to match, or a specific match to your own birthday. For this node, I will consider the former.

    Something will happen, or it won't. There is a 100% chance of that, or we can say that the probability of something happening or not happening is 1.0, so we can say that the probability of people having the same birthday is 1 - "not the same birthday".

    This would make a nice practice program, but you will probably need to use logarithms to perform the calculations if the numbers start to overflow.

    Hope this helped,
    -v
    "Perl. There is no substitute."
      Technically, you could find a collision in 1 try using a Monte Carlo method, i.e. random guessing. The change is extremely small, but not impossible. This is the case for *all* hashing no matter how large the hash space is.
Re^2: On showing the weakness in the MD5 digest function and getting bitten by scalar context
by danderson (Beadle) on Aug 27, 2004 at 21:47 UTC
    Edit: tachyon updated to fix. Leaving my comment for posterity, but it no longer applies.

    Sorry Tachyon, but you're wrong. In roughly 2^16 tries, you've got 50% odds of finding a collision - not 50% odds of finding a collision with a given result. So you have 50% odds of two numbers in the 2^16 tries colliding, but you do not have 50% odds of a number in the 2^16 tries colliding with the result of one specific output.

    You've mistaken "collision with a given point" with "collision between any two points." ++ for bringing up the birthday paradox, anyways.

      Of course you are right about finding a specific collision. I have had non specific collisions on my mind recently.

      cheers

      tachyon