http://www.perlmonks.org?node_id=555733

albert has asked for the wisdom of the Perl Monks concerning the following question:

I have a series of binary strings, such as:
$str = "00000001001000110100010101100111100010011010101111001101111011 +11"
I want to create a base64 representation of this for compression and storage in text format.

I've tried using Mime::Base64 with

encode_base64(pack("B64", $str))
However, with that the result is 13 characters long starting from my 64 bit example. I'd rather get compression down to 4 characters in this case. How can I go about that? Also, is there a good solution for a base32 representation?

-albert

Update: As Grandfather politely points out, I am an idiot. :-). The minimum would be 11, not 4.

Replies are listed 'Best First'.
Re: Binary string to base64
by GrandFather (Saint) on Jun 16, 2006 at 10:52 UTC

    Unless I misunderstand what you are trying to do - you can't! 64 bits is 8 bytes. base64 encoding encodes 6 bits per (8 bit) printable character for transmission over a network or some such system. 64 bits require at least 11 base64 characters for encoding - it is not a compression system.

    This may be another XY Problem,in which case a description of what you are trying to achieve would be a good idea.


    DWIM is Perl's answer to Gödel
      OK. You are right, I was having a serious brain fart (2**8 != 64). I shouldn't try to work when I have a bad cold. Let me try to clarify.

      I have a series of these 'binary' strings, which are either 45, 60, 90, or 120 characters in length. And I have several million of these strings. I want to store them in a database, but still using a text based storage and achieving maximum efficiency. I thought I might be able to take substrings of 'binary' digits, and convert them into a series of 64 printable characters (being the max that can be done with ASCII). Since I have aproximately 20 million of these strings, I thought I might want to get maximum efficiency in my text encoding. What would be the recommendation on that?

      But, given my brain fart, I might just go with the uuencoded solution.

      -a

Re: Binary string to base64
by ioannis (Abbot) on Jun 16, 2006 at 10:57 UTC
    base64 is meant for encoding high-bit data into a form suitable for transmission under the 7bit rules. It has nothing to do with compression. If you want to compress a 64-byte string of 0's and 1's, use pack .
    # The sender packs it my $p = pack 'b4' , $str ; # And the receiver unpacks it like this: print unpack 'b4', $p;
Re: Binary string to base64
by roboticus (Chancellor) on Jun 16, 2006 at 11:47 UTC
    albert:

    A couple of notes:

    If you're trying to compress the data into a "safe" form for transmission, you might want to try pack's 'w' data type, which will put data into a 7-bit (base 128) form, which will squeeze out a little.

    I'm not aware of a standard representation for base32 representation, but it's easy enough to come up with one.

    If you're trying to compress data only, short strings don't tend to compress all that well with most compression schemes as they learn from the data you're trying to compress. If the data has some structure or pattern that you know, you can enforce a coding scheme on top of it. For example, one scheme I've seen for compressing numbers relied on the fact that most numbers in the application are small, but they had to have a 32-bit (unsighed) range. So we encoded them such that a number under 128 would be represented in a single byte with the topmost bit equal to 0, numbers under 16384 were represented as a two-byte number with the top two bits set to 01, and all the others were represented with a five-byte number with the top two bits set to 11. Sure, the larger numbers were longer that 32 bits, but the small numbers were so frequent that it compressed very well. (We could've chopped the range again, but it was more effort for insignificant payoff...)

    I hope this helps...

    --roboticus

Re: Binary string to base64
by johngg (Canon) on Jun 16, 2006 at 15:20 UTC
    I've put together a couple of subroutines to pack and unpack binary strings. They allow for any length of string by prepending the original string length to the packed string so that the right string length can be used in the unpack template, e.g. b45 rather than b*. That way you don't get extra zeros padding the unpacked binary string. Here is a script that tests packing and unpacking random binary strings of a given length one hundred times. I have tested with strings of 12000 digits with no problem.

    Here it is

    Give it an argument of the length of string you want to test.

    I hope that it is useful.

    Cheers,

    JohnGG

Re: Binary string to base64
by Moron (Curate) on Jun 16, 2006 at 12:25 UTC
    The MIME solution is a family of formats for different content types and requires a MIME header to identify what precise format will have encoded the body - rather an awkward overhead for your simpler requirement.

    'Base 64' suggests you want 64 different "digits" to represent each combination of log2(64) = 6 bits. Text format also suggests 8 bits per character so that two bits would be unused. One simple format that contains only printable characters would be the 64 element subrange of ASCII from '0' (ASCII 48) to 'o' (ASCII 111), leading to the simple conversion algorithm of converting the 6 bits into ordinary octal, adding 48 and converting to ASCII using the perl pack, unpack and chr functions. For base 32, a similar idea could be used or you could just extend the hexadecimal format from 0..F to 0..V - but that would require supporting the gap in ASCII between '9' and 'A'.

    -M

    Free your mind

Re: Binary string to base64
by zentara (Archbishop) on Jun 16, 2006 at 13:52 UTC
    I'm in over my head here, but here is a way to get the compression down to 5 characters.
    #results 0000000100100011010001010110011110001001101010111100110111101111 -> + iL1@;
    This almost works. I can only get the last half of the bin string back, but maybe the gurus can see the packing error. Maybe needs 64 bit numbers or Math::BigINT?
    #!/usr/bin/perl use strict; use Math::Base85; my $str = "00000001001000110100010101100111100010011010101111001101111 +01111"; #binary to decimal conversion my $num = unpack("N", pack("B32", substr("0" x 32 . $str, -32))); print "$num\n"; my $m = Math::Base85::to_base85($num); print "$str ->\t$m\n"; #decode it ############################################## my $q = Math::Base85::from_base85($m); print "back to decimal ->\t$q\n"; #I'm losing a 32 bit chunk here my $str1 = unpack("B32", pack("N", $q)); print "back to bin ->\n", $str1,"\n";

    I'm not really a human, but I play one on earth. flash japh
      Thanks for the suggestion:

      Working within Math::BigInt, it is easy to make the integer by prepending the string with '0b'. Then, between Math::Base85 and Math::BigInt, it is easy to go back and forth, though one needs to prepend '0's after the decoding to get back to the original length.

      use Math::Base85 qw(from_base85 to_base85); $str = '0b000000010010001101000101011001111000100110101011110011011110 +1111'; $bint = Math::BigInt->new($str); $b85str = to_base85($bint); $bint2 = from_base85($b85str); print $bint2->as_bin();
      This does exactly what I want now. And I get slightly more efficiency than with base64.

      -albert

        Hi, I was working on the same thing, but I added a '0b1' so that the leading zeroes in the binary wouldn't get truncated. It condenses it down to 10 characters. Here's my attempt:
        #!/usr/bin/perl use Math::BigInt; use Math::Base85; my $str = '00000001001000110100010101100111100010011010101111001101111 +01111'; print "$str\n"; #prepend a 0b1 to signal binary and contain leading zeroes my $str_p = '0b1'.$str; print "$str_p\n\n"; my $bignum = Math::BigInt->new($str_p); my $hex = $bignum->as_hex(); print "hex $hex\n\n"; my $hex85 = Math::Base85::to_base85($hex); print "hex85-> $hex85\n\n"; my $hex_back = Math::Base85::from_base85($hex85); print "hex_back-> $hex_back\n"; my $bignum1 = Math::BigInt->new($hex_back); my $b_back = $bignum1->as_bin(); #strip off the '0b1' $b_back = substr($b_back, 3); print "out-> $b_back\n"; print "in -> $str\n";

        I'm not really a human, but I play one on earth. flash japh
A reply falls below the community's threshold of quality. You may see it by logging in.