comment on

So I've written a little script that encodes strings and files in base64. I was a little bored and "I'd rather not install another module for a one-off solution."

Basically it reads a chunk into a buffer, converts that chunk to a binary string using unpack and then encodes the result into base64. I added a few bells and whistles but that's the core of the algorithm.

The part about unpacking into a binary string (a string of ones and zeros) seemed really clunky to me. It was initially just a placeholder for a more efficient method to be added later. But I soon found it was a bit tricky to deal with non-standard bit groupings.

I tried using vec to look at the binary contents of the entire buffer but it has some annoying endianness issues. I tried unpacking the buffer a character at a time but it is annoying to deal with that remaining 2 bits.

That said, I think I'll still end up unpacking the buffer a character at a time and use vec to pull out 6 bits of that character and then store the remaining 2 bits to be prepended, etc.

It just seems to me that there ought to be a simpler way to handle a string on a bit-by-bit basis. Generally in Perl when I have to fiddle around this much it means there is a better way that I just don't know about.

Any tips would be appreciated! Also some critique of my coding style wouldn't be amiss either.

Code:

#! perl
# encode a string/file in Base64

use strict;
use Getopt::Std;
use File::Temp qw/tempfile/;

# get and process command line options
my %opts;
getopts('f:o:b:nl:s:d', \%opts);

my ($ifile,
    $ofile, 
    $buff_size,
    $no_lines,
    $max_len,
    $line_sep,
    $decode_flag) = unpack_opts(%opts);

# handle input from file
my $in_fh; 
if ($ifile) {
    # check input file path
    die "bad file/path: $ifile\n" if ! -f $ifile; 
    
    die "nothing to convert!\n" if ! -s $ifile;
    # open file or stdin for reading
    open (FILE, '<', $ifile) or die "could not open input file: $ifile
+\n";
    binmode FILE;
    $in_fh = *FILE;
}
# otherwise get input from STDIN
else {
    binmode STDIN;
    $in_fh = *STDIN;
}

# open file for writing
my ($out_fh, $tmp_fname);
if ($ofile) {
    open ($out_fh, '>', $ofile) or die "could not open output file: $o
+file\n";
}
# otherwise use temp file
else {
    ($out_fh, $tmp_fname) = tempfile(UNLINK => 1) or die "could not cr
+eate tmp file\n";
}

# create lookup table for encoded values
my @lookup = (('A'..'Z'),('a'..'z'),(0..9),('+','/'));

# set token size
# encoding = 6
# decoding = 8
my $token_size = $decode_flag ? 8 : 6;

my ($buffer, $prev_buffer, $line_len);
while(read $in_fh, $buffer, $buff_size) {
    # convert buffer to a binary string
    # TODO: read buffer without converting, this makes buffer x8 bigge
+r
    $buffer = unpack('B*', $buffer);
    # prepend whatever was left from the last buffer
    $buffer = $prev_buffer . $buffer;
    # calculate how many tokens are in this buffer
    my $num_tokens = int(length ($buffer) / $token_size);
    
    # parse tokens
    my $translated = '';
    for (1..$num_tokens) {
        my $token;
        ($token, $buffer) = unpack("a6 a*", $buffer);
        $translated .= $lookup[oct('0b'.$token)];
        # add line separator if max length has been reached
        if (!$no_lines && !(++$line_len % $max_len)) {
            $translated .= $line_sep;
            $line_len = 0; 
        }
    }
    
    print $out_fh $translated;
    
    $prev_buffer = $buffer;
}

# add padding if necessary
my $rest = 6 - length($prev_buffer);
if ($rest && $prev_buffer =~ /\d/) {
    print $out_fh $lookup[oct('0b'.sprintf("%s%0${rest}s", unpack('a*'
+, $prev_buffer)))] . '=' x ($rest / 2);
}

close $out_fh;
close FILE if $ifile;

# if no outfile, write output to STDOUT
if (!$ofile) {
    open (TMP, '<', $tmp_fname) or die "could not read tmp file: $tmp_
+fname\n";
    while (read TMP, $buffer, $buff_size) {
        print $buffer;
    }
    close TMP;
}

# function for unpacking command line options
sub unpack_opts {
    my (%opts) = @_; 
    
    # get input file, default STDIN
    my $ifile = $opts{f} ? $opts{f} : 0;
    
    # get output file, default STDOUT
    my $ofile = $opts{o} ? $opts{o} : 0;
    
    # get buffer size, default 57
    my $buff_size = $opts{b} =~ /\d+/ ? $opts{b} : 57;
    
    # toggle lines
    my $no_lines = $opts{n} ? 1 : 0;
    
    # get max line length, default 76
    my $max_len = $opts{l} =~ /\d+/ ? $opts{l} : 76;
    
    # get line separator
    my $line_sep = $opts{s} ? $opts{s} : "\r\n";
    
    # toggle decode rather than encode
    my $decode_flag = $opts{d} ? 1 : 0;
    
    return ($ifile,
            $ofile,
            $buff_size,
            $no_lines,
            $max_len,
            $line_sep,
            $decode_flag);
}
[download]

In reply to tinkering with base64 encoding by temporal

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


laziness, impatience, and hubris
	PerlMonks