Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

OT: Is MD5 ALWAYS 32 character long?

by DaWolf (Curate)
on Dec 18, 2003 at 11:31 UTC ( [id://315494]=perlquestion: print w/replies, xml ) Need Help??

DaWolf has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, fellow monks.

I know that this question may sound bizarre, but since I'm designing a database and always & never are very dangerous words, it must be asked.

I'm making a table which will store the MD5 hash of a users password, so:

Is it completely safe to make the password field a Char(32)? Or should I make it a Varchar(32)?

Thanks in advance,

my ($author_nickname, $author_email) = ("DaWolf","erabbott\@terra.com.br") if ($author_name eq "Er Galvão Abbott");

Replies are listed 'Best First'.
Re: OT: Is MD5 ALWAYS 32 character long?
by liz (Monsignor) on Dec 18, 2003 at 11:41 UTC
    From the pod of Digest::MD5:
    DESCRIPTION
    The "Digest::MD5" module allows you to use the RSA Data Security Inc. MD5 Message Digest algorithm from within Perl programs. The algorithm takes as input a message of arbitrary length and produces as output a 128-bit "fingerprint" or "message digest" of the input.

    So, the fingerprint is always a 128-bit value, which translates to 32 characters if you are using a hex-presentation of that value (each hex character taking 4 bits).

    So, if you want to store the MD5 as a string, Char(32) would be ok. Ideally, you would want to store it as the 128-bit value, which your database may or may not support.

    Liz

Re: OT: Is MD5 ALWAYS 32 character long?
by tachyon (Chancellor) on Dec 18, 2003 at 12:36 UTC

    NO. An MD5 hash is 16 bytes - 128 bits no more - no less. It can be stored >not truncated< in a CHAR(16+) field or a VARCHAR(16+) field as binary data. As base 64 the same 128 bits occupies 22 bytes (yes that is CHAR(22) or VARCHAR(22)) and as hex it occupies 32 bytes, etc

    Short answer yes

    Longer answer VARCHAR slows indexing significantly. One varchar field means table is varchar. CHAR is space inefficient but much faster than varchar for select. AAYMMV

    cheers

    tachyon

      CHAR is space inefficient but much faster than varchar for select.

      In this case, it's more space efficient, since you know that a MD5 string will always be a certain length l for a given encoding. The CHAR field will only need to store l characters, whereas VARCHAR will need to store l + the length of l. So I don't see any benefit to using VARCHAR in this case.

      Some databases may support bit vectors (like PostgreSQL's BIT type). In that case, you'll only need 16 bytes of storage space (assuming you use a non-VARYING bit field) compared to 32 bytes for hex encoding (assuming an 8-bit text encoding standard like ASCII (ok, technically 7-bit, but who actually stores it like that anymore?)). However, it's not portable to databases that don't support that type (like MySQL).

      ----
      I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
      -- Schemer

      : () { :|:& };:

      Note: All code is untested, unless otherwise stated

Re: OT: Is MD5 ALWAYS 32 character long?
by edan (Curate) on Dec 18, 2003 at 11:45 UTC

    I believe that the RSA MD5 checksum is always 128 bits (someone correct me if I'm wrong). That means that it is 16 bytes long. However, when storing an 'MD5 checksum', you'll generally store it in some encoding, generally hex. If it is hex encoded, then the resulting string is 32 characters long, no more, no less.

    So, the answer is 'no', since it depends on how you encode it, but if you know for a fact that it will be hex-encoded, then you can safely use a field length of 32.

    --
    3dan

Re: OT: Is MD5 ALWAYS 32 character long?
by !1 (Hermit) on Dec 18, 2003 at 12:02 UTC

    For your reading pleasure, RFC 1321 - The MD5 Message-Digest Algorithm. It's a 128-bit fingerprint that can be encoded in any manner you choose. If the source is encoding the fingerprint as hex, then yes it will always be 32 characters long. If not, then it would be in some format that you could convert to a hex representation which will also be 32 characters long.

Re: OT: Is MD5 ALWAYS 32 character long?
by derby (Abbot) on Dec 18, 2003 at 11:47 UTC
    The actual MD5 hash is 128 bits ... the hexadecimal representation of those bits is 32 bytes, the base64 representation is 22 bytes, and of course, the binary representation will be 16 bytes. That is set (look at the rfc). If you need more bits, you'll need to use a different hashing algorithm (like SHA-1).

    -derby

Re: OT: Is MD5 ALWAYS 32 character long?
by l3nz (Friar) on Dec 18, 2003 at 13:42 UTC
    I believe that, as the 32 byte string you see is the textual encoding of a 128 bit (16 bytes) binary string, it will consistently hold the same length no matter what the binary content will be.

    About CHAR or VARCHAR, it depends. It's usually better to use a CHAR because in most implementations a VARCHAR will require one more page fetch per string read because VARCHAR data will be stored in a separate disk table, but if you plan to access one row at a time and you will fetch that row from an index seek, the performance cost will be quite small anyway. On the other side, I would not routinely query a one million row database with a LIKE statement on a VARCHAR.... :-)

    A number of SQL implementations (mySQL for one) don't support mixed CHAR / VARCHAR fields in the same row, so if you change one field to VARCHAR all other CHARs will be converted to VARCHAR. So beware!

    Please note that whether you say CHAR(32) or VARCHAR(32), only the first 32 chars will be stored in the database.

Re: OT: Is MD5 ALWAYS 32 character long?
by castaway (Parson) on Dec 18, 2003 at 12:23 UTC
    'scuse me for wondering, but.. What difference does it make? Both char(32) and varchar(32) are maximum length 32 chars.. Or were you wondering if they were lots shorter with a max of 32, to warrant varchar?

    /me doesnt get it.

    C.

      If it is a char(32) there will be trailing spaces if the value is not 32 chars long. This could affect a lookup. If the match value is not sent in with trailing spaces to make it 32 chars long, the match will fail. At least this is the case in Oracle.

      sth

        If the match value is not sent in with trailing spaces to make it 32 chars long, the match will fail. At least this is the case in Oracle.

        Judging from what my copy of Oracle does, the output of select 'yes' from dual where 'hello ' = 'hello'; may surprise you.

        The SQL-92 standard says that trailing spaces should not affect whether two things are =, and Oracle seems to be in compliance. I think that any other behaviour would be a bug.

        Trailing spaces will affect like, decode, and so on though. And, of course, if you are actually fetching back the value and comparing in Perl, then you will definitely see the difference.

        So depending on how you intend to do the lookup, trailing spaces might not matter to you.

Re: OT: Is MD5 ALWAYS 32 character long?
by zakzebrowski (Curate) on Dec 18, 2003 at 23:37 UTC
    Write a sample program using Digest::MD5. Looking at the manual, you will realize you basically have three options
    • md5 - produces a (non human readable) binary string 16 bytes long
    • md5hex - 32 bytes of hex
    • md5_base64 - a base 64 string
    From that, you have to ask yourself, "self, do I want a base 64 string? Self, do I want a human readable 32 byte string? Self, do I want to go for storage and use a binary string?" Personally, because I don't like dealing with binary information, and I don't need a base 64 string, I would use the 32 byte hex string. I would then write a simple test program (below) which shows me the length of a string. (I could enhance it by taking an md5 checksum of the checksum itself, but that's left as an exercise to the user...) Also, I know from database classes that char storage when you have a fix length, is better than varchar storage for fixed length, I would use a char(32) for the column type. (Of course, the database may silently convert char to a varchar2 (like oracle) since the char size is large, but that's an implementation detail of the database system you use.)
    Default@z ~ $ more test.pl use Digest::MD5 qw (md5_hex); my $digest = Digest::MD5->new; $digest->add("Hi"); print length($digest->hexdigest);


    ----
    Zak

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://315494]
Approved by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-04-19 21:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found