Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Length of unpacked string for "hex" data

by SirBones (Friar)
on Apr 24, 2006 at 17:35 UTC ( [id://545333]=perlquestion: print w/replies, xml ) Need Help??

SirBones has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks. I'm writing a data file dumper. Gee, that's never been done in Perl before, has it? :-) I've looked around a bit for a simple answer to this, but I admit to being pretty new to pack/unpack intricacies; so I humbly beg your pardon if I'm missing something obvious.

The files in question are constructed (besides some fixed length header records) with repeating fields consisting of a 2-byte ASCII keyword, a 1-byte length (byte count), and a data field (size specified in length field.) The data field can contain either ASCII or binary data, depending on the keyword. For example:

RT | 0x07 | testing CD | 0x08 | 0x01020304FFDDEC19

Perfect for "unpack", correct? My only hangup is trying to generalize the loop that spits out the keywords and data. The data needs to be displayed as either hex or ASCII, depending on the keyword. So I have a hash which matches either an "A" or an "H" with each keyword. I then use the contents of the hash as the type specifier in the unpack template:

C/$kwfmt{$kw}

When the data is ASCII, this works fine. But when it's hex, I only get half of my string displayed because the "H" format uses the number of nibbles as its length, where my length specification is (of course) in bytes. Of course I can (and currently do) handle this in a cludgy way by checking the format during the loop and then using a different template based on "A" or "H"; but it's ugly.

I tried throwing in a repeat value after the C/$kwfmt{$kw} but of course then Perl says I can't use a count with the "/" specifier. I've also tried various means of doubling the "C" field within the template when the data is hex but that hasn't worked out either.

Here's a small demo to exemplify my dilemma. It prints the ASCII string correctly, but just half of the hex string:

#!/usr/bin/perl -w use strict; # Some example "keywords" and how they should be displayed my %kwfmt = ( "RT" => "A", "PN" => "A", "SN" => "A", "AB" => "H", "CD" => "H", "B1" => "H", ); # The kind of thing I will see in my file my @record; $record[0] = pack ("A2CA7", "RT", 7, "testing"); $record[1] = pack ("A2CH16", "CD", 8, "01020304FFDDEC19"); # Prints ASCII fields fine, truncates hex for (my $i=0; $i<2; $i++) { my ($kw) = unpack("A2", $record[$i]); my ($rdata) = unpack("x2C/$kwfmt{$kw}", $record[$i]); print "$kw $rdata\n"; }

The output of which is:

RT testing CD 01020304

As a side note, and probably displaying my ignorance, I wonder why the "H" specifier deals with nibbles as its basic unit and not bytes. Dealing with "hex" data by the byte would seem to be the far more common operation.

Thanks (as usual) so much.

Ken

"This bounty hunter is my kind of scum: Fearless and inventive." --J.T. Hutt

Replies are listed 'Best First'.
Re: Length of unpacked string for "hex" data
by ikegami (Patriarch) on Apr 24, 2006 at 20:07 UTC

    Extract the field, then convert to hex if needed:

    my %hex = map { $_ => 1 } qw( AB CD B1 ); my @records = ( pack("A2Ca7", "RT", 7, "testing"), pack("A2CH16", "CD", 8, "01020304FFDDEC19"), ); foreach my $record (@records) { my ($kw, $rdata) = unpack("A2C/a", $record); $rdata = uc(unpack('H*', $rdata)) if $hex{$kw}; print "$kw $rdata\n"; }

    or use a dispatch table to avoid the if:

    sub format_as_text { return $_[0]; } sub format_as_hex { return uc(unpack('H*', $_[0])); } my %kwfmt = ( RT => \&format_as_text, PN => \&format_as_text, SN => \&format_as_text, AB => \&format_as_hex, CD => \&format_as_hex, B1 => \&format_as_hex, ); my @records = ( pack("A2Ca7", "RT", 7, "testing"), pack("A2CH16", "CD", 8, "01020304FFDDEC19"), ); foreach (@records) { my ($kw, $rdata) = unpack("A2C/a", $_); $rdata = $kwfmt{$kw}->($rdata); print "$kw $rdata\n"; }

    By the way, there's no reason to hardcode the number of elements in the array. Instead of
    for (my $i=0; $i<2; $i++)
    you should use
    for (my $i=0; $i<@records; $i++)
    Or better yet, use
    for my $i (0..$#records)
    since it's easier to read and just as efficient. I used
    foreach my $record (@records)
    since it's even simpler and we didn't care about the record index.

      Thanks, very cool. And I like the dispatch table trick as well; seems I have another app where that will be useful. For the present case I'll stick with your "if"; it's less intrusive than mine.

      I noticed you used a lower-case "a" in the template:

      "A2C/a"

      Meaning to parse on a null-padded string rather than a space-padded one. Is that important here?

      Cheers,
      -Ken

      "This bounty hunter is my kind of scum: Fearless and inventive." --J.T. Hutt

        'A' will remove trailing NULs and whitespace.
        'a' will not.

        In other words,
        ($str) = unpack('c/A', $data);
        is equivalent to
        ($str) = unpack('c/a', $data);
        $str =~ s/[\0\s]+$//;

Re: Length of unpacked string for "hex" data
by wedgef5 (Scribe) on Apr 24, 2006 at 18:54 UTC
    I'm far from an expert on pack/unpack myself. In fact, I saw your post as a chance to learn a little myself! I think perhaps you should make the 'C' an 'I' in your templates, and then specify the hex data as 16 bits in length. The following worked for me...at least I got all 8 bytes of the hex back.

    $record[0] = pack ("A2IA7", "RT", 7, "testing"); $record[1] = pack ("A2IH16", "CD", 16, "01020304FFDDEC19"); # Prints ASCII fields fine, truncates hex for (my $i=0; $i<2; $i++) { my ($kw) = unpack("A2", $record[$i]); my ($rdata) = unpack("x2I/$kwfmt{$kw}", $record[$i]); print "$kw $rdata\n"; }

      Thanks for the suggestion. Of course I've managed to present a bad example. Your suggestion of "I" in the template only works if the hex string is 8 bytes or shorter (from what I can see.) In reality my hex strings may be significantly longer.

      Ken

      "This bounty hunter is my kind of scum: Fearless and inventive." --J.T. Hutt
Re: Length of unpacked string for "hex" data
by swampyankee (Parson) on Apr 24, 2006 at 21:35 UTC
    As a side note, and probably displaying my ignorance, I wonder why the "H" specifier deals with nibbles as its basic unit and not bytes. Dealing with "hex" data by the byte would seem to be the far more common operation.

    I would suspect dealing with nybbles, vice bytes, makes it easier when one has to deal with BCD

    emc

    "Being forced to write comments actually improves code, because it is easier to fix a crock than to explain it. "
    —G. Steele
Re: Length of unpacked string for "hex" data
by ikegami (Patriarch) on Apr 24, 2006 at 21:59 UTC
    As a side note, and probably displaying my ignorance, I wonder why the "H" specifier deals with nibbles as its basic unit and not bytes. Dealing with "hex" data by the byte would seem to be the far more common operation.

    The repeat count pertains to the number of inputs for pack and to the number of outputs for unpack. This is consistent across (almost?) all formats.

    For pack 'H', the repeat count translates to nibble count because each character passed to pack 'H' contains one nibble of information. Similarly, For unpack 'H', the repeat count translates to nibble count because each character returned by unpack 'H' contains one nibble of information.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://545333]
Approved by sweetblood
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (2)
As of 2025-01-14 05:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which URL do you most often use to access this site?












    Results (38 votes). Check out past polls.