Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
a finite, large number but not infinite.

md5 hashing can be applied to inputs of any length. "Any length" means an infinite number of possible inputs. Adapting the old schoolboy definition of infinite, however many inputs you have hashed, you can always add one more byte to any one of them to get one more. Infinite.

Every input in this range of inputs will be mapped to one of the 2**128 possible md5s.

Therefore the number of inputs that will map to any given md5 is infinity / 2**128

Which is infinity.


It has often been said that there are collisions in MD5.

Of course there are. It could not be otherwise. Whenever the range of inputs is greater than the range of outputs, there will always be collisions. It simply is not possible for it to be otherwise. (Would that it were, it would make a brilliant compression algorithm:)

Theoretically, if you generated the md5s for all of the (16-byte binary encoded) integers 0 .. 2**128 -1, you might not find any duplicates. That is to say, the input range equals the output range, so it is possible that the md5 algorithm would be a 'perfect hash' for the input data.

And if you double the length of the input data to 32 bytes, then (theoretically) there will be exactly 2 inputs (plaintexts) for every output (md5).

I don't think that this "perfect hash" status has ever been confirmed (or even suggested). It would simply take too long to verify.

The example of duplicate plaintexts posted were 128 (8-bit) bytes each. 2**1024 inputs / 2**128 outputs.

There will be at least 8 duplicate, 128-byte inputs for every md5. That makes it much easier to find a duplicate.

As far as your Linux passwords are concerned, not only would the cracker need to generate a plaintext that gave the same MD5, it would also need to be of a length that the password verification process would accept as a password. Whilst accepting longer passwords generally gives greater security, that is only true up to a certain limit. With md5, that limit is 16 (8-bit) bytes.

If the plaintexts where restricted to 32 x 8-bit bytes, then there will be 2**256 inputs and 2**128 outputs. Therefore there *must be* at least 2 x 32-byte inputs that will produce every md5.

If the plaintext is restricted to 16 x 8-bit bytes, then it is possible (but not proven), that the are no duplicates.

If the inputs are restricted to 12 (8-bit bytes), then it is very unlikely that it will be duplicates. If you further restrict that to 7-bit ASCII, even less likelyhood. Exclude control characters, less again.

This is one of those cases where more isn't always better.

Actually, in my experience, more is very rarely better in computing:)


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

In reply to Re^3: MD5 - what's the alternative by BrowserUk
in thread MD5 - what's the alternative by kiat

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2024-03-28 22:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found