in reply to
Too Convenient Security?
(First off, if MD5 makes you nervous in any way, switch to SHA1 which was designed to compensate for a perceived weakness of MD5 related to authentication, not to simple hashing)
When I produce an MD5 hash using Digest::MD5->md5*() from an input (salt or no) I don't get a string such as the one you indicate (which Ovid points out below is the output of a crypt command). I simply get a 128 bit number. That's a number between 0 and approximately 340,282,366,900,000,000,000,000,000,000,000,000,000 (or 2**128 to be exact ). I prefer to use the md5_hex method, which gives the number as a 32 character string representing the number in hexadecimal. For those playing the home game, 2**128 = 16**32. If I did get a string like you show, I would remove all but the hash itself. This should be nearly impossible to crack...
There are 60*60*24*365.25 seconds in the average year. That's 31,557,600 seconds. At 1,000,000 operations per second, you will be able to brute force all possible hashes in just over 10**25 years. Unless you know of a better way to go from a 128 bit number back to a matching input, such that I could enter that input into the password field on your CGI and thereby gain access. I'd probably have better luck simply running a high-end dictionary crack-- which even so would hopefully raise some alert (so after a hundred consecutive failed attempts to login you should disable the user).
Now, if the algorithm that produces MD5 hashes is such that there are "blank spots" between 0 and 2**128, where in order to find potential inputs (and these are always potentials, the function is not inversible if two inputs can have the same output) we don't need to test (possibly) all 2**128, then the algorithm is less than perfect. But unless it gets the aforementioned brute force timing down to something closer to a few years rather than eons, it is not that important.
One question for our real experts in the audience, wouldn't it be possible to store both the MD5 and the SHA1 hashes, such that an input matching both outputs has an extremely high chance of being unique? Or does knowing both of these outputs make reversing the algorithm to a valid input that much easier?
I think for your example, you might want to remove the salt as a constant and generate a unique salt for each user. But something simple, like reversing the user ID and doing a simple Caesar cipher on it or something to that effect. That way it is different, but predictable, on a per user basis. But I think anyone who can get your hash file is also likely to have access to your script and/or your "managed salt" file-- so I wouldn't put this much effort into it: if I have either of those two things I can easily determined what salt applies to which users (and as long as the salt is different for each user, then I have to rebuild my dictionary to attack each user separately). If the information is that sensitive take it offline.
Frankly, I think the odds of someone having access to the passwords file (even read only) mitigates the likelihood that they need to crack a password to get on the system at all--provided you properly secure the filesystem of the host to begin with. Is it a shared host? Then that password file should only be readable by the user that the CGI process will run as (*not* nobody as is often the case).
Finally, remember this all comes down to the passwords. Are they computationally inconvenient? If not, I might simply use LWP to keep submitting until I find a match. Your CGI should prevent weak passwords. And as part of defense in depth I would (as I said) limit the number of invalid tries. So once you've secured all this, I (as an attacker) will just switch modes to DoS or something: how about I just get your page posted to Slashdot? *grin*