Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Compact MD5 representation

by jplindstrom (Monsignor)
on Mar 20, 2001 at 02:00 UTC ( #65556=perlquestion: print w/ replies, xml ) Need Help??
jplindstrom has asked for the wisdom of the Perl Monks concerning the following question:

Hi!

I would like to use MD5 hashes, but with a little shorter string length than e.g. 2dd9f2ba1f924996fa231fdceb4c6f57. The main reason is that when the hash is part of a URL, it will often make the URL longer than what fits in an e-mail.

I figure I can take the hex value and somehow transform it into a base other than hex, e.g. encoded with these characters: a-zA-Z0-9.

But the math and the best way to do this is way over my head.

Any ideas?

/J

Comment on Compact MD5 representation
(tye)Re: Compact MD5 representation
by tye (Cardinal) on Mar 20, 2001 at 02:04 UTC

    Just use md5_base64() instead of md5_hex() (if you are using one of the popular Perl modules for generating your MD5).

            - tye (but my friends call me "Tye")
      The thing is, if I do that it is possible to get an output string with a + / or possibly = in it, and that needs encoding to be kept in a url.

      Of course, I could do that myself pretty easily when I have the base64 hash.

      Thanks!

      /J

        You could also trivially use tr/// to get a base-64-type encoding that uses only things matching [-\w] which requires no escaping in a URL (to preserve minimal length).

                - tye (but my friends call me "Tye")
        I had this issue with MD5 hashes recently, but with even more constraints!

        First, on top of using the MD5 hash in a URL, and sending that URL in an email, I also needed to use the MD5 hash in an actual email address. (Thus allowing the recipient to either go to the URL or reply to the email address.) Second, I couldn't translate to a period within the MD5 hash, because that character was already being used for something else.

        I ended up using this translation (the translation from plus to plus is intentional): tr,+=/,+-_,
        That works fine for the email address. The only iffy part is the plus in the URL; I just made sure that the CGI script doesn't convert the plus to a space.

Re: Compact MD5 representation
by arturo (Vicar) on Mar 20, 2001 at 02:10 UTC

    An email client might wrap the display of the URL onto a second line, but if it's worth it's salt it won't actually stick a newline in the string, which *would* wreck your URL (in the sense of making an unusable hypertext link, for clients that support that sort of thing).

    Other than this mostly display-related issue, I can't think of what you might mean by a URL not fitting into an email. If it's aesthetics you're worried about, well, you already have that ugly string in there, so might as well overdo it while you're into it. (as Steve Vai would say).

    So if you want to provide clickable links, try it with the full-dress md5 hash. An alternative thing is to truncate the hash and hope for the best, i.e. grab the last 16 characters and hope none of the values collides? Or maybe I shouldn't recommend anything so reckless ...

    Philosophy can be made out of anything. Or less -- Jerry A. Fodor

      There's nothing reckless about taking only part of an md5 hash. The MD5 algorithm is a good hashing algorithm and ensures that small changes to the source will result in big changes to the hash, so you don't have to worry about similar inputs producing similarities in the (first|last|middle) X bits. As for hoping there are no collisions, reducing the hash size does increase the chance of collision, but there's no guarantee that each full length md5 will be unique. It should be trivial to write code which checks for a collision and fixes it (by appending characters onto the input data until the md5 is unique, for example).
        I'm a php person, so I can't offer code, but I do understand the question which is being asked.

        Rather than showing a url of /?a=123456, the questioner is looking for a url that says /?a=rqz instead. By using the full range of alphanumeric characters, he is hoping to reduce the MD5 from a 32 character string (using 0-9,a-f), into something much smaller, maybe a 12 character string (using 0-9, a-z). This has the benefit of being shorter, while still spanning the same range of values (2^32) as MD5.

        php has the function: string base_convert (string number, int frombase, int tobase), which I'm sure has an equivalent in perl. Once you find this function the problem is reduced to:

        $new_number = base_convert( md5('whatever'), 16, 36 ); // 10 digits + 26 alphanumeric

        Hopefully some perl guru will be able to point you in the right direction.

        --Robert

Re: Compact MD5 representation
by Anonymous Monk on Mar 20, 2001 at 20:41 UTC
    you could only deal with the first/last n character of the checksum.

    This wouldn't be as good as using the full MD5 Csum but would work.
Re: Compact MD5 representation
by YoungPups (Beadle) on Mar 20, 2001 at 22:03 UTC
    I got bored, and tried to write my own base converter (without having first checked CPAN where I would have found Math::BaseCalc which would do what you want just fine, I think). It runs into problems with large numbers (The md5 value listed in your post, for example).

    But if anyone's interested in tinkering, I'll post what I came up with... I think I could fix it by using Math::BigInt, but haven't tried just yet.

    sub convert_base { my($input_string,$from_base,$to_base) = @_; my($working_number,$current_digit,$i); my($quotient,converted_number); # # Convert input string to dec. # for ( $i = 0; $i < length($input_string); $i++ ) { $current_digit = uc(substr($input_string,$i,1)); if ( $current_digit !~ m/[0-9]/ ) { $current_digit = ord($current_digit) - ord("A") + 10; } $working_number += $current_digit * ($from_base ** (length($input_string) - $i - 1)); } # Figure out dec value of this digit. # # Convert int to new base. # while ( $quotient = int($working_number / $to_base) ) { $current_digit = $working_number % $to_base; if ( $current_digit > 9 ) { $current_digit = chr($current_digit - 10 + ord("A")); } $converted_number = $current_digit . $converted_number; $working_number = $quotient; } # # We've dropped out because there's only a bit of remainder left. # $current_digit = $working_number; if ( $current_digit > 9 ) { $current_digit = chr($current_digit - 10 + ord("A")); } $converted_number = $current_digit . $converted_number; return $converted_number; }
      You could use Math::BaseCalc (as suggested above) on the the first half and last half of the MD5, and then concatenate them. If you convert to base32 or base64, you wouldn't waste any bits. (And if you convert to base62 0-9a-zA-Z, you waste a couple bits, but it's portable just about everywhere.)
Re (tilly) 1: Compact MD5 representation
by tilly (Archbishop) on Mar 22, 2001 at 03:35 UTC
    I had a day off today, and this problem bugged me until I did something about it. So here is a solution:
    use Math::Fleximal; print Math::Fleximal->new( "2dd9f2ba1f924996fa231fdceb4c6f57", [0..9, 'a'..'f'] )->change_flex([0..9,'a'..'z','A'..'Z'])->to_str();
    which will print "+1owfX3qQrJC5h1IIWhMy23". Strip off the + and you are in business.

    You will need Math::Fleximal of course.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://65556]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (13)
As of 2014-09-02 21:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (30 votes), past polls