kiat has asked for the wisdom of the Perl Monks concerning the following question:
In reference to the node On showing the weakness in the MD5 digest function and getting bitten by scalar context by grinder.
I'm using Digest::MD5 to generate unique cookie values as well as for password authentication.
Should I or should I not continue to use Digest::MD5 for those operations? What's the alternative?
Thanks in anticipation :)
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: MD5 - what's the alternative
by ikegami (Patriarch) on Aug 27, 2004 at 05:46 UTC | |
| [reply] |
Re: MD5 - what's the alternative
by dws (Chancellor) on Aug 27, 2004 at 05:57 UTC | |
I'm using Digest::MD5 to generate unique cookie values as well as for password authentication. Should I or should I not continue to use Digest::MD5 for those operations? The vulnerability is that if a 3rd party intercepts the MD5 hash, they can spend a few days of compute time to discover a plain-text input that will produce the same hash. You can mitigate or effectively eliminate the threat by limiting the length plaintext passwords (say, 12 to 16 characters). Then, even if an attacker finds a longer text string that results in the same MD5 hash, they're cut off by the limit. However, unless you're mixing user-supplied plaintext with some secret string before generating a hash, you're open to dictionary attacks. MD5 is way down on the list of things I'm worrying about right now, but there's always the chance that I'm being naive. | [reply] |
by ctilmes (Vicar) on Aug 27, 2004 at 12:05 UTC | |
If user supplies PW, which gets hashed to PW', and a 3rd party gets PW', the weakness allows them to discover another password that also hashes to PW'. If you add additional bits to the supplied PW -- PWpri, and hash that to PWpri', which the attacker gets, and using the techniques described, comes up with some bits that also hash to PWpri', they still can't come up with a valid PW that when pri gets added to it also produce PWpri'. Of course, if your security is such that an attacker can discover PWpri', they can probably find pri out anyway, and you're probably owned at that point anyway, so discovering PW is the least of your concerns. | [reply] |
by beable (Friar) on Aug 27, 2004 at 08:22 UTC | |
| [reply] |
by fergal (Chaplain) on Aug 27, 2004 at 09:55 UTC | |
| [reply] |
Re: MD5 - what's the alternative
by BrowserUk (Patriarch) on Aug 27, 2004 at 06:40 UTC | |
Take this with a large pinch of salt as IANAM, but...I think that the 'problem' is being over-stated. The upshot of it is that the time taken to find a piece of text that will produce the same md5 checksum is cut from a notional "few million years" to "a few days". What seems to have been over looked is that there is no guarentee that the piece of text with the same checksum, is the same as the text that produced the checksum that is being attacked. In fact, it is most unlikely to be so. It is fairly obvious that any hashing algorithm that is used to map any number of arbitrary length pieces of data to some fixed size number, will produce collisions--lots of them. In fact, an infinite number of collisions! But in order to 'crack' a given checksum, you don't need to find a piece of text that produces the sort after checksum. You need to find the piece that was used to produce the sort after checksum. That means you need to produce every piece of text that can produce the given checksum, and then decide which of that (infinite number of possibles), is the piece that your trying to decode. The only real risk with using md5 is that it is possible that you might generate the same checksum from two concurrently active sessions (or other use). The answer to this is to generate your md5, look in your database to see if it is already active, and generate a new one (for example: add a random number that isn't used to the end of whatever you are encoding). Rinse and repeat until you get an md5 that is unique within your database. Chances are in practice, this collision will rarely if ever happen, but whenit does, your code within then deal with it. | [reply] |
by blm (Hermit) on Aug 27, 2004 at 07:24 UTC | |
It is fairly obvious that any hashing algorithm that is used to map any number of arbitrary length pieces of data to some fixed size number, will produce collisions--lots of them. In fact, an infinite number of collisions! Not trying to split hairs, but I would say a finite, large number but not infinite. There is a difference: if it was infinite you could keep finding collisions forever but if it is finite there would be a time that you would stop finding collisions. And I don't think that it is overstated. It has often been said that there are collisions in MD5. Its strength was that it took a long time and/or alot of computing power to find the thing that generates a particular hash which i will call plaintext. Now that time is lessened. It doesn't matter if one finds the original plain text or an alternative the generates the same hash the computer will not know the difference. eg MD5 is used for passwords on linux machines. No matter what if the plaintext at logon matches the hash in the shadow file login will be granted whether it was "the original plaintext" or a "collision plaintext" Update:See posts by BrowserUk and adrianh below. I was thinking the collisions were restricted to the set of possible hashes. Bizarre! | [reply] |
by BrowserUk (Patriarch) on Aug 27, 2004 at 08:41 UTC | |
a finite, large number but not infinite. md5 hashing can be applied to inputs of any length. "Any length" means an infinite number of possible inputs. Adapting the old schoolboy definition of infinite, however many inputs you have hashed, you can always add one more byte to any one of them to get one more. Infinite. Every input in this range of inputs will be mapped to one of the 2**128 possible md5s. Therefore the number of inputs that will map to any given md5 is infinity / 2**128 Which is infinity. It has often been said that there are collisions in MD5. Of course there are. It could not be otherwise. Whenever the range of inputs is greater than the range of outputs, there will always be collisions. It simply is not possible for it to be otherwise. (Would that it were, it would make a brilliant compression algorithm:) Theoretically, if you generated the md5s for all of the (16-byte binary encoded) integers 0 .. 2**128 -1, you might not find any duplicates. That is to say, the input range equals the output range, so it is possible that the md5 algorithm would be a 'perfect hash' for the input data. And if you double the length of the input data to 32 bytes, then (theoretically) there will be exactly 2 inputs (plaintexts) for every output (md5). I don't think that this "perfect hash" status has ever been confirmed (or even suggested). It would simply take too long to verify. The example of duplicate plaintexts posted were 128 (8-bit) bytes each. 2**1024 inputs / 2**128 outputs. There will be at least 8 duplicate, 128-byte inputs for every md5. That makes it much easier to find a duplicate. As far as your Linux passwords are concerned, not only would the cracker need to generate a plaintext that gave the same MD5, it would also need to be of a length that the password verification process would accept as a password. Whilst accepting longer passwords generally gives greater security, that is only true up to a certain limit. With md5, that limit is 16 (8-bit) bytes. If the plaintexts where restricted to 32 x 8-bit bytes, then there will be 2**256 inputs and 2**128 outputs. Therefore there *must be* at least 2 x 32-byte inputs that will produce every md5. If the plaintext is restricted to 16 x 8-bit bytes, then it is possible (but not proven), that the are no duplicates. If the inputs are restricted to 12 (8-bit bytes), then it is very unlikely that it will be duplicates. If you further restrict that to 7-bit ASCII, even less likelyhood. Exclude control characters, less again. This is one of those cases where more isn't always better. Actually, in my experience, more is very rarely better in computing:) | [reply] [d/l] [select] |
by adrianh (Chancellor) on Aug 27, 2004 at 08:33 UTC | |
Not trying to split hairs, but I would say a finite, large number but not infinite. Erm. Nope. Map an infinite number of things into a finite number of boxes and you'll get an infinite number of collisions. Dividing infinity by N just gives you infinity, not "a finite, large number". | [reply] |
by beable (Friar) on Aug 27, 2004 at 08:17 UTC | |
| [reply] |
by BrowserUk (Patriarch) on Aug 27, 2004 at 10:06 UTC | |
by beable (Friar) on Aug 27, 2004 at 11:19 UTC | |
| |
by danderson (Beadle) on Aug 27, 2004 at 21:15 UTC | |
... "Not trying to split hairs, but I would say a finite, large number but not infinite." Sigh. md5 is a reasonably good hash. As it's input # grows, even as it approaches infinity, there are no numbers in it's range (output space) that cease to be 'hit.' So, theoretically, you can feed it an infinite number of consecutive inputs, and some subset of them they will give you an infinite number of collisions on any given point on the output space. But we're not talking about math in theory, we're talking about math in the real world. There are limits, based on speed of computation, memory size, disk size, etc. Based on these, there is a finite (though very large) number of possible md5 sums calculable in any given timeframe - even if that timeframe is "from the advent of the abacus to the heat death of the universe, when there's no entropy generatable and no work can be done." Less facetiously, I'd say that the difficulty of computing md5 sums from, say, >1 Terabyte inputs means that there will be a very low number of collisions from inputs that high. Why bother, when you can get a collision from under-quadruple-digit bytes? So, the answer is really 'both.' In theory, there's an infinite number of collisions for any md5 output. In practice, there certainly isn't, and the number of collisions that will be generated in our lifetimes is finite to the point of being understandable, and maybe even visualized, by our little human brains. edit: more importantly, "Therefore the number of inputs that will map to any given md5 is infinity / 2**128" is incorrect. You're assuming even distribution from the domain to the range. This is not proven (otherwise given any consecutive set of (2**128)-1 elements, they'd cover the range of md5 minus one, and adding one more would cover the range entirely. Not yet proven to be true, and in fact quite unlikely). So division doesn't follow, thus while your conclusion is correct your path to get there isn't. | [reply] |
by BrowserUk (Patriarch) on Aug 30, 2004 at 10:37 UTC | |
by Aristotle (Chancellor) on Aug 29, 2004 at 21:28 UTC | |
You miss the point. The situation in which the "panic" applies is that I have received a message, and I have a trusted MD5 checksum of the message originally sent. (In practice, the checksum is protected using public key cryptography.) The message I received hashes to the same MD5 checksum as that of the original message. How certain can I be that the message has not been altered in transit? If an attacker can find a collision in reasonable time, he can pad a modified (or completely different) version of the message such that it hashes to the original checksum, and I can no longer trust the message I received any more than I could without the checksum. In other words, a cryptographic signature is worthless if the hashing function is weak. And it seems that MD5 has turned out to be weak. That doesn't make it entirely useless. There are many scenarios outside cryptographic signatures where it is still useful. Makeshifts last the longest. | [reply] |
by BrowserUk (Patriarch) on Aug 29, 2004 at 22:08 UTC | |
No. I haven't missed the point. I'm not sure where you got the quoted word "panic" from, but it wasn't any of my posts. How certain can I be that the message has not been altered in transit? Let's just assume for a moment that you received your "trusted md5" via secure means. How would the bad guys know what is was in order to create a message that hashed to that same MD5? If the MD5 was not transmitted to you by secure means, then it's an aweful lot easier to alter both the message and the md5. ... he can pad a modified (or completely different) version of the message such that it hashes to the original checksum, ... This is completely wrong! The attack consists of altering bits of bytes of the original message to produce a duplicate message. The results will be the same length as the original, with a few bits altered. Hence, if the message is plain text, it will show obvious signs of tampering. Wrong letters in words, accented characters that don't fit. It will probably look as though it has suffered from corruption in transit with a few bits having been dropped or switched. Chances are that the intent of the original message would be almost intact. What the attacker cannot do, is change it to something specific. The "weakness" in the md5 digital (not cryptographic) signature , certainly does not allow the attacker to "pad a modified (or completely different) version of the message such that it hashes to the original checksum". If the message is binary, either compressed, encrypted or both, the effect of the bits changed by the mathematical manipulations, will likely render an executable unrunnable; a compressed file undecompressible; and an encrypted file undecryptable. These format being more more sensitive to random bit corruption than plain text. Nowhere, in any of the publically available material that I have been able to access over the last couple of days does is suggest (or even hint) that it is possible to replace one message with another of entirely different meaning and then coerse it to produce the original md5. And I believe I've read everything available to read. Your suggestion that this is possible, shows a distinct lack of understanding of the processes involved. Please read the (rather extended) thread starting at 386470 and if your still convinced that I have missed the point then /msg me and we can continue this offline. | [reply] |
by Aristotle (Chancellor) on Aug 29, 2004 at 22:48 UTC | |
by BrowserUk (Patriarch) on Aug 29, 2004 at 23:14 UTC | |
| |
Re: MD5 - what's the alternative
by dragonchild (Archbishop) on Aug 27, 2004 at 11:55 UTC | |
Personally, I think this is about as secure as it could be, given the nature of the Net, even using MD5. The token is only viable for a certain period of time if it's unused, after which it's scrubbed from the tables. If it is used, the token lives for a much shorter period of time. As for password authentication ... I'd probably use SHA-512 for that. The expense in the larger hash is offset by the fact that it only happens once/session. I generate tokens every page request, so it has to be cheaper, CPU-wise. ------
Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose I shouldn't have to say this, but any code, unless otherwise stated, is untested | [reply] |
Re: MD5 - what's the alternative
by gmpassos (Priest) on Aug 28, 2004 at 17:49 UTC | |
The most important thing is to know if you have a static or dynamic fingerprint. This means, if the output of MD5 will be generated (with different outputs) every day, or if you have a static fingerprint (like in passwd). If you have a dynamic fingerprint will be much more harder to crack it, but if it's static we have all the time that we need to crack it. Now about CRACK a MD5. Well, what you can know is that is impossible to get back the original text. Like we say, is a fingerprint, not all the body. What is possible to do with MD5, is to find by brute force a string that can produce again the fingerprint, but this doesn't mean that the password used by the user is that string found with brute force. In other words, we always have more than 1 string (actually much more) that produce the same fingerprint. This exists for any "digest" algorithm, the question is that MD5 is faster than SHA, and with MD5 we can use some tricks to reduce the number of attacks. So, use a dynamic fingerprint and always put with the original string some extra data, specially dynamic data, to make the brute force harder,
Graciliano M. P. | [reply] |