|Problems? Is your data what you think it is?|
See 386498. Then point me at the vulnerability that this discovery opens up?
I recently ran a process that produced 100 million md5s of randomly generated data. I hit duplicates in that process on at least two runs. For my purposes, using the md5 Digest as a hashing function, I simply added a space to the end of the text to maintain uniqueness. For that application, the trailing whitespace was irrelevant.
But 2 runs out of 4 or 5 of 100 million showed duplicates. A total runtime of less than 24 hours. Was I just extraordinarially (un)lucky? I don't think so. As I said in another thread recently, stats ain't my strong suite, but I think that the odds of generating 2 matching pairs from 500 million is probably well within statistical norms.
However, if you gave me an md5, and asked me to find a plaintext that matched it, without giving me the plaintext you had used to generate it. That would be computationally infeasible. This, I belive, is what the md5 algorithm is intended to achieve.
But if you need the original plaintext in order to generate the new plaintext?
Alternatively, take my trojan binary and the md5 from some trusted piece of code, and then tell me what bytes I need to insert into data space (and where) within that binary in order for it's md5 to match that of the trusted piece of software. That would be a vulnerability that would make me consider md5 broken.
In reply to Re^6: On showing the weakness in the MD5 digest function and getting bitten by scalar context