Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Problems Getting the Template Correct when Using unpack()

by ozboomer (Friar)
on Oct 10, 2017 at 04:01 UTC ( [id://1201064]=perlquestion: print w/replies, xml ) Need Help??

ozboomer has asked for the wisdom of the Perl Monks concerning the following question:

Hi, all...

Another couple of funny things... with unpack() this time.

I'm opening a binary data file, setting binary mode... and then reading some blocks of data... but while I'm getting the data out of the file Ok, translating/mapping it into some useful variables is causing me all sorts of troubles.

In the attached code, the data is placed into the '$inbuf' string so that it is stored the same way as when I'm reading the data directly from the file.

What I'm expecting to get out of the code is something like:-

Format: (TBA) $event_len: 2Ex = 46 $date_str: 4/10/2017 $time_str: 00:00:00 $rec_type_idx: 10x = 16 $source_name: !WV2 ! $data: !CI=427 Lamp dim state=2 [SCATS=0].! $tail_len: 2Ex = 46

*Note: The 'C' near the start of the $data output is actually a 080x value. I didn't want to post a binary value here.

...but what I'm actually getting as an output is as follows (my attempts at using varying unpack() templates):-

$inbuf: >.u[EOT] [DLE]WV2 CI=427 Lamp dim state=2 [SCATS=0].< Format: C* $event_len: 2Ex = 46 $date_str: 4/10/2017 $time_str: 00:00:00 $rec_type_idx: 10x = 16 $source_name: !87! $data: !86! $tail_len: 32x = 50 Format: C8a6C* $event_len: 2Ex = 46 $date_str: 4/10/2017 $time_str: 00:00:00 $rec_type_idx: 10x = 16 $source_name: !WV2 ! $data: !128! $tail_len: 49x = 73 Format: C8a* $event_len: 2Ex = 46 $date_str: 4/10/2017 $time_str: 00:00:00 $rec_type_idx: 10x = 16 $source_name: !WV2 CI=427 Lamp dim state=2 [SCATS=0].! $data: !! $tail_len: 00x = 0 Format: C8a6C33C1 $event_len: 2Ex = 46 $date_str: 4/10/2017 $time_str: 00:00:00 $rec_type_idx: 10x = 16 $source_name: !WV2 ! $data: !128! $tail_len: 49x = 73

Note that the 0x2E byte is the 'length of record' and appears before and after the actual data record (hence, the 0x2E values at the bounds of the above array).

Here's the code I'm using:

use strict; use warnings; my @data = ( 0x2E, 0x75, 0x0A, 0x04, 0x00, 0x00, 0x00, 0x10, 0x57, 0x5 +6, 0x32, 0x20, 0x20, 0x20, 0x80, 0x49, 0x3D, 0x34, 0x32, 0x3 +7, 0x20, 0x4C, 0x61, 0x6D, 0x70, 0x20, 0x64, 0x69, 0x6D, 0x2 +0, 0x73, 0x74, 0x61, 0x74, 0x65, 0x3D, 0x32, 0x20, 0x5B, 0x5 +3, 0x43, 0x41, 0x54, 0x53, 0x3D, 0x30, 0x5D, 0x2E ); my @formats = ( "C*", "C8a6C*", "C8a*", "C8a6C33C1" ); my $inbuf = join('', map { chr($_) } @data); # Input buffer printf("\$inbuf: >%s<\n", $inbuf); printf("\n"); foreach my $fmt (@formats) { my $tmpbuf = $inbuf; # In case something's being corrupted my ($event_len, $year, $mon, $day, $hour, $min, $sec, $rec_type_idx, $source_name, $data, $tail_len) = unpack($fmt, $ +tmpbuf); my $date_str = sprintf("%s/%s/%s", $day, $mon, $year+1900); my $time_str = sprintf("%02d:%02d:%02d", $hour, $min, $sec); $rec_type_idx &= 0x1F; # Only use b0-b6 printf(" Format: %s\n", $fmt); printf(" \$event_len: %02Xx = %d\n", $event_len, $event_len); printf(" \$date_str: %s\n", $date_str); printf(" \$time_str: %s\n", $time_str); printf("\$rec_type_idx: %02Xx = %d\n", $rec_type_idx, $rec_type_idx +); printf(" \$source_name: !%s!\n", $source_name); printf(" \$data: !%s!\n", $data); printf(" \$tail_len: %02Xx = %d\n", $tail_len, $tail_len); printf("\n"); }

Environment: ActiveState Perl v5.16.3, Windows 10 64-bit.

I'd greatly appreciate any clues as to what I should be trying to do with the unpack() template.

Replies are listed 'Best First'.
Re: Problems Getting the Template Correct when Using unpack()
by haukex (Archbishop) on Oct 10, 2017 at 06:38 UTC

    When unpacking data that contains a length record followed by data of that length, the / template character helps, although in this case it makes it a two-stage process:

    my ($data,$taillen) = unpack 'C/aC', $inbuf; my @recs = (length($data), unpack('C7a6A*',$data), $taillen);

    where as far as I can tell @recs is the same as your ($event_len, $year, ..., $tail_len). Also, TIMTOWTDI, the following return the same @recs, although I think these are a bit more ugly than the above:

    # "X" means "back up one byte" my @recs = unpack 'C8a6A*XC', $inbuf; chop $recs[-2]; # remove final "tail_len" C from the string # - or - my @recs = $inbuf=~m{\A (.) (.)(.)(.)(.)(.)(.)(.) (.{6}) (.*) (\1) \z}msxaa or die "didn't match"; $_=ord for @recs[0..7,10];
Re: Problems Getting the Template Correct when Using unpack()
by AnomalousMonk (Archbishop) on Oct 10, 2017 at 05:10 UTC

    As I understand it, everything is a fixed-width field except the  $data field, the width of which must be computed based on total record length. If so, try this:

    Output:
    c:\@Work\Perl\monks\ozboomer>perl unpack_1.pl ".u\n\4\0\0\0\20WV2 \x80I=427 Lamp dim state=2 [SCATS=0]." (48, 46, 13) Format: 'x[C] C3 C3 C a6 a33 C' $body_len: 2Ex = 46 $date_str: 4/10/2017 $time_str: 00:00:00 $rec_type_idx: 10x = 16 $source_name: !WV2 ! $data: !ÇI=427 Lamp dim state=2 [SCATS=0]! $tail_len: 2Ex = 46

    Update:

    *Note: The 'C' near the start of the $data output is actually a 080x value. I didn't want to post a binary value here.
    I don't understand this statement in the OP, so I'm ignoring it.


    Give a man a fish:  <%-{-{-{-<

      I still have to work my way through these notes... Fanx!

      ...but to clarify what I was talking about re:

      $data: !CI=427 Lamp dim state=2 SCATS=0.!

      What I wrote was a mistake...

      What I meant to say was: I show a "C" character (ASCII decimal 67, 0x43) at the start of the '$data' list of characters, when in the @data array you'll see it's actually a "special" character (ASCII decimal 128, 0x80).

      I simply wasn't sure if the "special" character would upset some display thing if I included that unusual character in my posting... tha's all.

      "Sorry about that, Chief...

        ... I show a "C" character ... it's actually a "special" character (... 0x80).

        Is this character always a 0x80 character? If it's always the same, do you really care about extracting this character (which looks like some kind of delimiter/placeholder)? If it varies, how does it vary?

        I was thinking about a regex-based solution, and a constant 0x80 in the middle of the record could be a nice little anchor.

        Update: Added  <blockquote> block to start of node.


        Give a man a fish:  <%-{-{-{-<

Re: Problems Getting the Template Correct when Using unpack()
by BillKSmith (Monsignor) on Oct 10, 2017 at 21:30 UTC
    Your last format is very close to correct. Because the result $data is a string, the format field that corresponds to it must be an 'A', not a 'C'.
    #my @formats = ( "C*", "C8a6C*", "C8a*", "C8a6C33C1"); my @formats = ( "C8A6A33C1" );
    Bill

      By way of some more clarification, BTW...

      From reading pack, there was something about how 'A' strips whitespace, etc and 'a' doesn't do that when using unpack. For no particular reason, I wanted to retain the entire field, hence the usage of 'a' in the template.

        'A' strips whitespace, etc and 'a' doesn't do that

        True, but I think BillKSmith's point was that you were using 'C' instead of 'a'. From pack:

        a A string with arbitrary binary data, will be null padded. A A text (ASCII) string, will be space padded. Z A null-terminated (ASCIZ) string, will be null padded. C An unsigned char (octet) value.

        Here's the difference:

        use Data::Dump; my $data = "\x01\x02\x03\x20\x00"; dd unpack 'C*', $data; # prints (1, 2, 3, 32, 0) dd unpack 'a*', $data; # prints "\1\2\3 \0" dd unpack 'A*', $data; # prints "\1\2\3" dd unpack 'Z*', $data; # prints "\1\2\3 " dd unpack 'Z*', $data."X "; # prints "\1\2\3 " dd unpack 'A*', $data."X "; # prints "\1\2\3 \0X" dd unpack 'a*', $data."X "; # prints "\1\2\3 \0X "

        So if you want to retain all of the binary data as a string for further unpacking, the 'a' template is the way to go. 'C' will split that string into its individual bytes and return those values.

        Update: Added the second 'A*' example.

Re: Problems Getting the Template Correct when Using unpack()
by ozboomer (Friar) on Oct 10, 2017 at 22:49 UTC

    One thing that may not be clear... is that the '$data' item is not always going to be a string, hence my attempt to use 'C' in the template.

    For some values of the '$rec_type_idx', the '$data' item may contain text (as in the original example) but it can be actual binary data (including NULs) that needs further decoding.

      Your specification of C33 returns a list of 33 unsigned numbers. You need an array to hold them. Another option is to use the array, but fill it with a list of characters ( use (a)33 ).
      C:\Users\Bill\forums\monks>type ozboomer.pl use strict; use warnings; my @data = ( 0x2E, 0x75, 0x0A, 0x04, 0x00, 0x00, 0x00, 0x10, 0x57, 0x5 +6, 0x32, 0x20, 0x20, 0x20, 0x80, 0x49, 0x3D, 0x34, 0x32, 0x3 +7, 0x20, 0x4C, 0x61, 0x6D, 0x70, 0x20, 0x64, 0x69, 0x6D, 0x2 +0, 0x73, 0x74, 0x61, 0x74, 0x65, 0x3D, 0x32, 0x20, 0x5B, 0x5 +3, 0x43, 0x41, 0x54, 0x53, 0x3D, 0x30, 0x5D, 0x2E ); my @formats = ( "C8a6C33C1", "C8A6(A)33C1" ); my $inbuf = join('', map { chr($_) } @data); # Input buffer printf("\$inbuf: >%s<\n", $inbuf); printf("\n"); foreach my $fmt (@formats) { my $tmpbuf = $inbuf; # In case something's being corrupted my ($event_len, $tail_len); my ($year, $mon, $day); my ($hour, $min, $sec); my $rec_type_idx; my $source_name; my @data; ($event_len, $year, $mon, $day, $hour, $min, $sec, $rec_type_idx, $source_name, @data[0..32], $tail_len) = unpack( +$fmt, $tmp buf); my $date_str = sprintf("%s/%s/%s", $day, $mon, $year+1900); my $time_str = sprintf("%02d:%02d:%02d", $hour, $min, $sec); $rec_type_idx &= 0x1F; # Only use b0-b6 printf(" Format: %s\n", $fmt); printf(" \$event_len: %02Xx = %d\n", $event_len, $event_len); printf(" \$date_str: %s\n", $date_str); printf(" \$time_str: %s\n", $time_str); printf("\$rec_type_idx: %02Xx = %d\n", $rec_type_idx, $rec_type_idx +); printf(" \$source_name: !%s!\n", $source_name); printf(" \$data: !%s!\n", join(' ', @data)); printf(" \$tail_len: %02Xx = %d\n", $tail_len, $tail_len); printf("\n"); } C:\Users\Bill\forums\monks>perl ozboomer.pl $inbuf: >.u &#9830; &#9658;WV2 ÇI=427 Lamp dim state=2 [SCATS=0].< Format: C8a6C33C1 $event_len: 2Ex = 46 $date_str: 4/10/2017 $time_str: 00:00:00 $rec_type_idx: 10x = 16 $source_name: !WV2 ! $data: !128 73 61 52 50 55 32 76 97 109 112 32 100 105 109 32 +115 116 97 116 101 61 50 32 91 83 67 65 84 83 61 48 93! $tail_len: 2Ex = 46 Format: C8A6(A)33C1 $event_len: 2Ex = 46 $date_str: 4/10/2017 $time_str: 00:00:00 $rec_type_idx: 10x = 16 $source_name: !WV2! $data: !Ç I = 4 2 7 L a m p d i m s t a t e = 2 [ S C A T +S = 0 ]! $tail_len: 2Ex = 46

      I recommend that you do return $data as a string (use a33). Yes, you do need additional processing, but string processing is what perl does best.

      Bill
Re: Problems Getting the Template Correct when Using unpack()
by ozboomer (Friar) on Oct 11, 2017 at 23:36 UTC
    Your specification of C33 returns a list of 33 unsigned numbers. You need an array to hold them. Another option is to use the array, but fill it with a list of characters ( use (a)33 ).

    Ahhh..! Tha's the biggest clue, methinks... and something I was (half) mindful of - treating the list of bytes as a string when it really should be an array.

    So, the way I'm looking to proceed now is along the following lines (code follows):-

    Again, I'm very appreciative of everyone's help in helping me get around this annoyingly 'simple' problem... :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1201064]
Approved by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (9)
As of 2024-04-26 08:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found