I recently ran a process that produced 100 million md5s of randomly generated data. I hit duplicates in that process on at least two runs.
Are you absolutely certain that the random data itself contained no duplicates? I would be very interested to see these MD5 collisions of yours.
I think that the odds of generating 2 matching pairs from 500 million is probably well within statistical norms.
Sure. If you reduce MD5 to about 30 bits.
However, if you gave me an md5, and asked me to find a plaintext that matched it, without giving me the plaintext you had used to generate it. That would be computationally infeasible. This, I belive, is what the md5 algorithm is intended to achieve.
This is known as the Preimage Problem. It's much more difficult than the Collision Problem (finding two messages with the same MD5). Cryptographic hashes are supposed to prevent someone from doing either one. You can answer "no they aren't" again if you want, but you will still be wrong.
Alternatively, take my trojan binary and the md5 from some trusted piece of code, and then tell me what bytes I need to insert into data space (and where) within that binary in order for it's md5 to match that of the trusted piece of software. That would be a vulnerability that would make me consider md5 broken.
There are more uses of MD5 than are dreamt of in your philosophy, Horatio.