tinkering with base64 encoding

temporal has asked for the wisdom of the Perl Monks concerning the following question:

So I've written a little script that encodes strings and files in base64. I was a little bored and "I'd rather not install another module for a one-off solution."

Basically it reads a chunk into a buffer, converts that chunk to a binary string using unpack and then encodes the result into base64. I added a few bells and whistles but that's the core of the algorithm.

The part about unpacking into a binary string (a string of ones and zeros) seemed really clunky to me. It was initially just a placeholder for a more efficient method to be added later. But I soon found it was a bit tricky to deal with non-standard bit groupings.

I tried using vec to look at the binary contents of the entire buffer but it has some annoying endianness issues. I tried unpacking the buffer a character at a time but it is annoying to deal with that remaining 2 bits.

That said, I think I'll still end up unpacking the buffer a character at a time and use vec to pull out 6 bits of that character and then store the remaining 2 bits to be prepended, etc.

It just seems to me that there ought to be a simpler way to handle a string on a bit-by-bit basis. Generally in Perl when I have to fiddle around this much it means there is a better way that I just don't know about.

Any tips would be appreciated! Also some critique of my coding style wouldn't be amiss either.

Code:

#! perl
# encode a string/file in Base64

use strict;
use Getopt::Std;
use File::Temp qw/tempfile/;

# get and process command line options
my %opts;
getopts('f:o:b:nl:s:d', \%opts);

my ($ifile,
    $ofile, 
    $buff_size,
    $no_lines,
    $max_len,
    $line_sep,
    $decode_flag) = unpack_opts(%opts);

# handle input from file
my $in_fh; 
if ($ifile) {
    # check input file path
    die "bad file/path: $ifile\n" if ! -f $ifile; 
    
    die "nothing to convert!\n" if ! -s $ifile;
    # open file or stdin for reading
    open (FILE, '<', $ifile) or die "could not open input file: $ifile
+\n";
    binmode FILE;
    $in_fh = *FILE;
}
# otherwise get input from STDIN
else {
    binmode STDIN;
    $in_fh = *STDIN;
}

# open file for writing
my ($out_fh, $tmp_fname);
if ($ofile) {
    open ($out_fh, '>', $ofile) or die "could not open output file: $o
+file\n";
}
# otherwise use temp file
else {
    ($out_fh, $tmp_fname) = tempfile(UNLINK => 1) or die "could not cr
+eate tmp file\n";
}

# create lookup table for encoded values
my @lookup = (('A'..'Z'),('a'..'z'),(0..9),('+','/'));

# set token size
# encoding = 6
# decoding = 8
my $token_size = $decode_flag ? 8 : 6;

my ($buffer, $prev_buffer, $line_len);
while(read $in_fh, $buffer, $buff_size) {
    # convert buffer to a binary string
    # TODO: read buffer without converting, this makes buffer x8 bigge
+r
    $buffer = unpack('B*', $buffer);
    # prepend whatever was left from the last buffer
    $buffer = $prev_buffer . $buffer;
    # calculate how many tokens are in this buffer
    my $num_tokens = int(length ($buffer) / $token_size);
    
    # parse tokens
    my $translated = '';
    for (1..$num_tokens) {
        my $token;
        ($token, $buffer) = unpack("a6 a*", $buffer);
        $translated .= $lookup[oct('0b'.$token)];
        # add line separator if max length has been reached
        if (!$no_lines && !(++$line_len % $max_len)) {
            $translated .= $line_sep;
            $line_len = 0; 
        }
    }
    
    print $out_fh $translated;
    
    $prev_buffer = $buffer;
}

# add padding if necessary
my $rest = 6 - length($prev_buffer);
if ($rest && $prev_buffer =~ /\d/) {
    print $out_fh $lookup[oct('0b'.sprintf("%s%0${rest}s", unpack('a*'
+, $prev_buffer)))] . '=' x ($rest / 2);
}

close $out_fh;
close FILE if $ifile;

# if no outfile, write output to STDOUT
if (!$ofile) {
    open (TMP, '<', $tmp_fname) or die "could not read tmp file: $tmp_
+fname\n";
    while (read TMP, $buffer, $buff_size) {
        print $buffer;
    }
    close TMP;
}

# function for unpacking command line options
sub unpack_opts {
    my (%opts) = @_; 
    
    # get input file, default STDIN
    my $ifile = $opts{f} ? $opts{f} : 0;
    
    # get output file, default STDOUT
    my $ofile = $opts{o} ? $opts{o} : 0;
    
    # get buffer size, default 57
    my $buff_size = $opts{b} =~ /\d+/ ? $opts{b} : 57;
    
    # toggle lines
    my $no_lines = $opts{n} ? 1 : 0;
    
    # get max line length, default 76
    my $max_len = $opts{l} =~ /\d+/ ? $opts{l} : 76;
    
    # get line separator
    my $line_sep = $opts{s} ? $opts{s} : "\r\n";
    
    # toggle decode rather than encode
    my $decode_flag = $opts{d} ? 1 : 0;
    
    return ($ifile,
            $ofile,
            $buff_size,
            $no_lines,
            $max_len,
            $line_sep,
            $decode_flag);
}
[download]

Comment on tinkering with base64 encoding Select or Download Code

Replies are listed 'Best First'.
Re: tinkering with base64 encoding by sauoq (Abbot) on Jun 14, 2012 at 21:59 UTC
So I've written a little script that encodes strings and files in base64. I was a little bored and "I'd rather not install another module for a one-off solution." I can buy the being bored part. The part about not installing another module for a one-off solution isn't remotely defensible. In the time it took to write what you have, and the time you've already spent debugging, and the time you spent writing this node.... you could have installed MIME::Base64 and written `use MIME::Base64; # . . . my $encoded = base64_encode($whatever);` [download] many dozens of times over. I'd certainly rather do that. Especially for a "one-off solution". `-sauoq "My two cents aren't worth a dime.";`	[reply] [d/l]
Re^2: tinkering with base64 encoding by tobyink (Canon) on Jun 14, 2012 at 23:34 UTC
"you could have installed MIME::Base64" No need - it's come pre-installed with Perl since version 5.7.3, released over ten years ago. `perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'`	[reply]
Re^3: tinkering with base64 encoding by sauoq (Abbot) on Jun 15, 2012 at 00:55 UTC
No need That may be too presumptuous. I still run into systems with 5.005_04 (and sometimes older) on them. `-sauoq "My two cents aren't worth a dime.";`	[reply]
Re^2: tinkering with base64 encoding by temporal (Pilgrim) on Jun 14, 2012 at 22:15 UTC
Whoa! By far the most hostile reply that I've ever gotten on PerlMonks. Maybe I should add that I also thought I'd learn something by rolling my own. Anyway, now that we've established that I'm not so efficient in regard to my use of time - what about my code? ^{Strange things are afoot at the Circle-K.}	[reply]
Re^3: tinkering with base64 encoding by CountZero (Bishop) on Jun 15, 2012 at 06:06 UTC
I do not think that response was hostile. It just pointed out how to be more efficient. If that hit a touchy spot with you, perhaps the response was indeed correct. :) And if you want to have a look at another one's code for doing base64, look at the source for MIME::Base64::Perl. It is much more clean and short than your code, mainly because it uses the "`u`" template in its `pack` and `unpack` functions to first encode/decode the data to uuencode standard (which is very close to base64) and then, after doing some cleanup, uses `tr` to replace the uuencode characters by base64 characters in a one-to-one replacement. Fast, clean and easy to maintain. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics	[reply] [d/l] [select]
Re^3: tinkering with base64 encoding by sauoq (Abbot) on Jun 15, 2012 at 00:41 UTC
Whoa! By far the most hostile reply that I've ever gotten on PerlMonks. What was hostile about it? I certainly didn't mean for it to be so. And you needn't misinterpret my thoughts as a judgment on how you decide to spend your time. It's your time. What I was calling into question was your rationale. It would be unfortunate if someone rather less experienced read your well-written node and took away the notion that rolling your own is a good thing to do when you are writing a one-off, especially when exactly the opposite would probably be better advice. Maybe I should add that I also thought I'd learn something Another fair reason. what about my code? I haven't looked at it. I'd really just use the two line fix I've already mentioned. ;-) `-sauoq "My two cents aren't worth a dime.";`	[reply]
Re: tinkering with base64 encoding by temporal (Pilgrim) on Jun 15, 2012 at 14:49 UTC
Thanks for the great replies, guys. What a neat way to solve the problem in MIME::Base65::Perl, thanks CountZero! I didn't realize that there was a Perl-flavored solution in there. I did check out MIME::Base64, but of course it uses an XS. No worries sauoq, point taken =) ^{Strange things are afoot at the Circle-K.}	[reply]

Back to Seekers of Perl Wisdom