http://www.perlmonks.org?node_id=976307

temporal has asked for the wisdom of the Perl Monks concerning the following question:

So I've written a little script that encodes strings and files in base64. I was a little bored and "I'd rather not install another module for a one-off solution."

Basically it reads a chunk into a buffer, converts that chunk to a binary string using unpack and then encodes the result into base64. I added a few bells and whistles but that's the core of the algorithm.

The part about unpacking into a binary string (a string of ones and zeros) seemed really clunky to me. It was initially just a placeholder for a more efficient method to be added later. But I soon found it was a bit tricky to deal with non-standard bit groupings.

I tried using vec to look at the binary contents of the entire buffer but it has some annoying endianness issues. I tried unpacking the buffer a character at a time but it is annoying to deal with that remaining 2 bits.

That said, I think I'll still end up unpacking the buffer a character at a time and use vec to pull out 6 bits of that character and then store the remaining 2 bits to be prepended, etc.

It just seems to me that there ought to be a simpler way to handle a string on a bit-by-bit basis. Generally in Perl when I have to fiddle around this much it means there is a better way that I just don't know about.

Any tips would be appreciated! Also some critique of my coding style wouldn't be amiss either.

Code:

#! perl # encode a string/file in Base64 use strict; use Getopt::Std; use File::Temp qw/tempfile/; # get and process command line options my %opts; getopts('f:o:b:nl:s:d', \%opts); my ($ifile, $ofile, $buff_size, $no_lines, $max_len, $line_sep, $decode_flag) = unpack_opts(%opts); # handle input from file my $in_fh; if ($ifile) { # check input file path die "bad file/path: $ifile\n" if ! -f $ifile; die "nothing to convert!\n" if ! -s $ifile; # open file or stdin for reading open (FILE, '<', $ifile) or die "could not open input file: $ifile +\n"; binmode FILE; $in_fh = *FILE; } # otherwise get input from STDIN else { binmode STDIN; $in_fh = *STDIN; } # open file for writing my ($out_fh, $tmp_fname); if ($ofile) { open ($out_fh, '>', $ofile) or die "could not open output file: $o +file\n"; } # otherwise use temp file else { ($out_fh, $tmp_fname) = tempfile(UNLINK => 1) or die "could not cr +eate tmp file\n"; } # create lookup table for encoded values my @lookup = (('A'..'Z'),('a'..'z'),(0..9),('+','/')); # set token size # encoding = 6 # decoding = 8 my $token_size = $decode_flag ? 8 : 6; my ($buffer, $prev_buffer, $line_len); while(read $in_fh, $buffer, $buff_size) { # convert buffer to a binary string # TODO: read buffer without converting, this makes buffer x8 bigge +r $buffer = unpack('B*', $buffer); # prepend whatever was left from the last buffer $buffer = $prev_buffer . $buffer; # calculate how many tokens are in this buffer my $num_tokens = int(length ($buffer) / $token_size); # parse tokens my $translated = ''; for (1..$num_tokens) { my $token; ($token, $buffer) = unpack("a6 a*", $buffer); $translated .= $lookup[oct('0b'.$token)]; # add line separator if max length has been reached if (!$no_lines && !(++$line_len % $max_len)) { $translated .= $line_sep; $line_len = 0; } } print $out_fh $translated; $prev_buffer = $buffer; } # add padding if necessary my $rest = 6 - length($prev_buffer); if ($rest && $prev_buffer =~ /\d/) { print $out_fh $lookup[oct('0b'.sprintf("%s%0${rest}s", unpack('a*' +, $prev_buffer)))] . '=' x ($rest / 2); } close $out_fh; close FILE if $ifile; # if no outfile, write output to STDOUT if (!$ofile) { open (TMP, '<', $tmp_fname) or die "could not read tmp file: $tmp_ +fname\n"; while (read TMP, $buffer, $buff_size) { print $buffer; } close TMP; } # function for unpacking command line options sub unpack_opts { my (%opts) = @_; # get input file, default STDIN my $ifile = $opts{f} ? $opts{f} : 0; # get output file, default STDOUT my $ofile = $opts{o} ? $opts{o} : 0; # get buffer size, default 57 my $buff_size = $opts{b} =~ /\d+/ ? $opts{b} : 57; # toggle lines my $no_lines = $opts{n} ? 1 : 0; # get max line length, default 76 my $max_len = $opts{l} =~ /\d+/ ? $opts{l} : 76; # get line separator my $line_sep = $opts{s} ? $opts{s} : "\r\n"; # toggle decode rather than encode my $decode_flag = $opts{d} ? 1 : 0; return ($ifile, $ofile, $buff_size, $no_lines, $max_len, $line_sep, $decode_flag); }

Replies are listed 'Best First'.
Re: tinkering with base64 encoding
by sauoq (Abbot) on Jun 14, 2012 at 21:59 UTC
    So I've written a little script that encodes strings and files in base64. I was a little bored and "I'd rather not install another module for a one-off solution."

    I can buy the being bored part.

    The part about not installing another module for a one-off solution isn't remotely defensible.

    In the time it took to write what you have, and the time you've already spent debugging, and the time you spent writing this node.... you could have installed MIME::Base64 and written

    use MIME::Base64; # . . . my $encoded = base64_encode($whatever);
    many dozens of times over. I'd certainly rather do that. Especially for a "one-off solution".

    -sauoq
    "My two cents aren't worth a dime.";

      "you could have installed MIME::Base64"

      No need - it's come pre-installed with Perl since version 5.7.3, released over ten years ago.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
        No need

        That may be too presumptuous. I still run into systems with 5.005_04 (and sometimes older) on them.

        -sauoq
        "My two cents aren't worth a dime.";

      Whoa! By far the most hostile reply that I've ever gotten on PerlMonks.

      Maybe I should add that I also thought I'd learn something by rolling my own.

      Anyway, now that we've established that I'm not so efficient in regard to my use of time - what about my code?

      Strange things are afoot at the Circle-K.

        I do not think that response was hostile. It just pointed out how to be more efficient. If that hit a touchy spot with you, perhaps the response was indeed correct. :)

        And if you want to have a look at another one's code for doing base64, look at the source for MIME::Base64::Perl.

        It is much more clean and short than your code, mainly because it uses the "u" template in its pack and unpack functions to first encode/decode the data to uuencode standard (which is very close to base64) and then, after doing some cleanup, uses tr to replace the uuencode characters by base64 characters in a one-to-one replacement. Fast, clean and easy to maintain.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        My blog: Imperial Deltronics
        Whoa! By far the most hostile reply that I've ever gotten on PerlMonks.

        What was hostile about it? I certainly didn't mean for it to be so.

        And you needn't misinterpret my thoughts as a judgment on how you decide to spend your time. It's your time.

        What I was calling into question was your rationale. It would be unfortunate if someone rather less experienced read your well-written node and took away the notion that rolling your own is a good thing to do when you are writing a one-off, especially when exactly the opposite would probably be better advice.

        Maybe I should add that I also thought I'd learn something

        Another fair reason.

        what about my code?

        I haven't looked at it. I'd really just use the two line fix I've already mentioned. ;-)

        -sauoq
        "My two cents aren't worth a dime.";
Re: tinkering with base64 encoding
by temporal (Pilgrim) on Jun 15, 2012 at 14:49 UTC

    Thanks for the great replies, guys.

    What a neat way to solve the problem in MIME::Base65::Perl, thanks CountZero! I didn't realize that there was a Perl-flavored solution in there. I did check out MIME::Base64, but of course it uses an XS.

    No worries sauoq, point taken =)

    Strange things are afoot at the Circle-K.