Problems Getting the Template Correct when Using unpack()

ozboomer has asked for the wisdom of the Perl Monks concerning the following question:

Hi, all...

Another couple of funny things... with unpack() this time.

I'm opening a binary data file, setting binary mode... and then reading some blocks of data... but while I'm getting the data out of the file Ok, translating/mapping it into some useful variables is causing me all sorts of troubles.

In the attached code, the data is placed into the '$inbuf' string so that it is stored the same way as when I'm reading the data directly from the file.

What I'm expecting to get out of the code is something like:-

       Format: (TBA)
   $event_len: 2Ex = 46
    $date_str: 4/10/2017
    $time_str: 00:00:00
$rec_type_idx: 10x = 16
 $source_name: !WV2   !
        $data: !CI=427 Lamp dim state=2 [SCATS=0].!
    $tail_len: 2Ex = 46
[download]

*Note: The 'C' near the start of the $data output is actually a 080x value. I didn't want to post a binary value here.

...but what I'm actually getting as an output is as follows (my attempts at using varying unpack() templates):-

$inbuf: >.u[EOT]   [DLE]WV2   CI=427 Lamp dim state=2 [SCATS=0].<

        Format: C*
   $event_len: 2Ex = 46
    $date_str: 4/10/2017
    $time_str: 00:00:00
$rec_type_idx: 10x = 16
 $source_name: !87!
        $data: !86!
    $tail_len: 32x = 50

        Format: C8a6C*
   $event_len: 2Ex = 46
    $date_str: 4/10/2017
    $time_str: 00:00:00
$rec_type_idx: 10x = 16
 $source_name: !WV2   !
        $data: !128!
    $tail_len: 49x = 73

        Format: C8a*
   $event_len: 2Ex = 46
    $date_str: 4/10/2017
    $time_str: 00:00:00
$rec_type_idx: 10x = 16
 $source_name: !WV2   CI=427 Lamp dim state=2 [SCATS=0].!
        $data: !!
    $tail_len: 00x = 0

        Format: C8a6C33C1
   $event_len: 2Ex = 46
    $date_str: 4/10/2017
    $time_str: 00:00:00
$rec_type_idx: 10x = 16
 $source_name: !WV2   !
        $data: !128!
    $tail_len: 49x = 73
[download]

Note that the 0x2E byte is the 'length of record' and appears before and after the actual data record (hence, the 0x2E values at the bounds of the above array).

Here's the code I'm using:


use strict;
use warnings;

my @data = ( 0x2E, 0x75, 0x0A, 0x04, 0x00, 0x00, 0x00, 0x10, 0x57, 0x5
+6, 
             0x32, 0x20, 0x20, 0x20, 0x80, 0x49, 0x3D, 0x34, 0x32, 0x3
+7, 
             0x20, 0x4C, 0x61, 0x6D, 0x70, 0x20, 0x64, 0x69, 0x6D, 0x2
+0, 
             0x73, 0x74, 0x61, 0x74, 0x65, 0x3D, 0x32, 0x20, 0x5B, 0x5
+3, 
             0x43, 0x41, 0x54, 0x53, 0x3D, 0x30, 0x5D, 0x2E );


my @formats = ( "C*", "C8a6C*", "C8a*", "C8a6C33C1" );
             
my $inbuf = join('', map { chr($_) } @data);  # Input buffer


printf("\$inbuf: >%s<\n", $inbuf);
printf("\n");

foreach my $fmt (@formats) {
   my $tmpbuf = $inbuf;  # In case something's being corrupted
   my ($event_len, $year, $mon, $day, $hour, $min, $sec, 
       $rec_type_idx, $source_name, $data, $tail_len) = unpack($fmt, $
+tmpbuf);
   
   my $date_str = sprintf("%s/%s/%s", $day, $mon, $year+1900);
   my $time_str = sprintf("%02d:%02d:%02d", $hour, $min, $sec);
      $rec_type_idx &= 0x1F;      # Only use b0-b6 
      
   printf("        Format: %s\n", $fmt);
   printf("   \$event_len: %02Xx = %d\n", $event_len, $event_len);   
   printf("    \$date_str: %s\n", $date_str);   
   printf("    \$time_str: %s\n", $time_str);   
   printf("\$rec_type_idx: %02Xx = %d\n", $rec_type_idx, $rec_type_idx
+);   
   printf(" \$source_name: !%s!\n", $source_name);   
   printf("        \$data: !%s!\n", $data);   
   printf("    \$tail_len: %02Xx = %d\n", $tail_len, $tail_len);   
   printf("\n");
   
}
[download]

Environment: ActiveState Perl v5.16.3, Windows 10 64-bit.

I'd greatly appreciate any clues as to what I should be trying to do with the unpack() template.

Comment on Problems Getting the Template Correct when Using unpack() Select or Download Code

Replies are listed 'Best First'.
Re: Problems Getting the Template Correct when Using unpack() by haukex (Archbishop) on Oct 10, 2017 at 06:38 UTC
When unpacking data that contains a length record followed by data of that length, the `/` template character helps, although in this case it makes it a two-stage process: `my ($data,$taillen) = unpack 'C/aC', $inbuf; my @recs = (length($data), unpack('C7a6A',$data), $taillen);` [download] where as far as I can tell `@recs` is the same as your `($event_len, $year, ..., $tail_len)`. Also, TIMTOWTDI, the following return the same `@recs`, although I think these are a bit more ugly than the above: `# "X" means "back up one byte" my @recs = unpack 'C8a6AXC', $inbuf; chop $recs[-2]; # remove final "tail_len" C from the string # - or - my @recs = $inbuf=~m{\A (.) (.)(.)(.)(.)(.)(.)(.) (.{6}) (.*) (\1) \z}msxaa or die "didn't match"; $_=ord for @recs[0..7,10];` [download]	[reply] [d/l] [select]
Re: Problems Getting the Template Correct when Using unpack() by AnomalousMonk (Archbishop) on Oct 10, 2017 at 05:10 UTC
As I understand it, everything is a fixed-width field except the `$data` field, the width of which must be computed based on total record length. If so, try this: Read more... (2 kB) Output: `c:\@Work\Perl\monks\ozboomer>perl unpack_1.pl ".u\n\4\0\0\0\20WV2 \x80I=427 Lamp dim state=2 [SCATS=0]." (48, 46, 13) Format: 'x[C] C3 C3 C a6 a33 C' $body_len: 2Ex = 46 $date_str: 4/10/2017 $time_str: 00:00:00 $rec_type_idx: 10x = 16 $source_name: !WV2 ! $data: !ĮI=427 Lamp dim state=2 [SCATS=0]! $tail_len: 2Ex = 46` [download] Update: *Note: The 'C' near the start of the $data output is actually a 080x value. I didn't want to post a binary value here. I don't understand this statement in the OP, so I'm ignoring it. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^2: Problems Getting the Template Correct when Using unpack() by ozboomer (Friar) on Oct 10, 2017 at 10:38 UTC
I still have to work my way through these notes... Fanx! ...but to clarify what I was talking about re: $data: !CI=427 Lamp dim state=2 SCATS=0.! What I wrote was a mistake... What I meant to say was: I show a "C" character (ASCII decimal 67, 0x43) at the start of the '$data' list of characters, when in the @data array you'll see it's actually a "special" character (ASCII decimal 128, 0x80). I simply wasn't sure if the "special" character would upset some display thing if I included that unusual character in my posting... tha's all. "Sorry about that, Chief...	[reply]
Re^3: Problems Getting the Template Correct when Using unpack() by AnomalousMonk (Archbishop) on Oct 10, 2017 at 10:53 UTC
... I show a "C" character ... it's actually a "special" character (... 0x80). Is this character always a 0x80 character? If it's always the same, do you really care about extracting this character (which looks like some kind of delimiter/placeholder)? If it varies, how does it vary? I was thinking about a regex-based solution, and a constant 0x80 in the middle of the record could be a nice little anchor. Update: Added `<blockquote>` block to start of node. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^4: Problems Getting the Template Correct when Using unpack() by ozboomer (Friar) on Oct 10, 2017 at 21:09 UTC
Re: Problems Getting the Template Correct when Using unpack() by BillKSmith (Monsignor) on Oct 10, 2017 at 21:30 UTC
Your last format is very close to correct. Because the result $data is a string, the format field that corresponds to it must be an 'A', not a 'C'. `#my @formats = ( "C", "C8a6C", "C8a*", "C8a6C33C1"); my @formats = ( "C8A6A33C1" );` [download] Bill	[reply] [d/l]
Re^2: Problems Getting the Template Correct when Using unpack() by ozboomer (Friar) on Oct 10, 2017 at 22:37 UTC
By way of some more clarification, BTW... From reading pack, there was something about how 'A' strips whitespace, etc and 'a' doesn't do that when using unpack. For no particular reason, I wanted to retain the entire field, hence the usage of 'a' in the template.	[reply]
Re^3: Problems Getting the Template Correct when Using unpack() by haukex (Archbishop) on Oct 11, 2017 at 07:58 UTC
'A' strips whitespace, etc and 'a' doesn't do that True, but I think BillKSmith's point was that you were using 'C' instead of 'a'. From pack: `a A string with arbitrary binary data, will be null padded. A A text (ASCII) string, will be space padded. Z A null-terminated (ASCIZ) string, will be null padded. C An unsigned char (octet) value.` [download] Here's the difference: `use Data::Dump; my $data = "\x01\x02\x03\x20\x00"; dd unpack 'C', $data; # prints (1, 2, 3, 32, 0) dd unpack 'a', $data; # prints "\1\2\3 \0" dd unpack 'A', $data; # prints "\1\2\3" dd unpack 'Z', $data; # prints "\1\2\3 " dd unpack 'Z', $data."X "; # prints "\1\2\3 " dd unpack 'A', $data."X "; # prints "\1\2\3 \0X" dd unpack 'a', $data."X "; # prints "\1\2\3 \0X "` [download] So if you want to retain all of the binary data as a string for further unpacking, the 'a' template is the way to go. 'C' will split that string into its individual bytes and return those values. Update:* Added the second `'A*'` example.	[reply] [d/l] [select]
Re: Problems Getting the Template Correct when Using unpack() by ozboomer (Friar) on Oct 10, 2017 at 22:49 UTC
One thing that may not be clear... is that the '$data' item is not always going to be a string, hence my attempt to use 'C' in the template. For some values of the '$rec_type_idx', the '$data' item may contain text (as in the original example) but it can be actual binary data (including NULs) that needs further decoding.	[reply]
Re^2: Problems Getting the Template Correct when Using unpack() by BillKSmith (Monsignor) on Oct 11, 2017 at 12:37 UTC
Your specification of C33 returns a list of 33 unsigned numbers. You need an array to hold them. Another option is to use the array, but fill it with a list of characters ( use (a)33 ). C:\Users\Bill\forums\monks>type ozboomer.pl use strict; use warnings; my @data = ( 0x2E, 0x75, 0x0A, 0x04, 0x00, 0x00, 0x00, 0x10, 0x57, 0x5 +6, 0x32, 0x20, 0x20, 0x20, 0x80, 0x49, 0x3D, 0x34, 0x32, 0x3 +7, 0x20, 0x4C, 0x61, 0x6D, 0x70, 0x20, 0x64, 0x69, 0x6D, 0x2 +0, 0x73, 0x74, 0x61, 0x74, 0x65, 0x3D, 0x32, 0x20, 0x5B, 0x5 +3, 0x43, 0x41, 0x54, 0x53, 0x3D, 0x30, 0x5D, 0x2E ); my @formats = ( "C8a6C33C1", "C8A6(A)33C1" ); my $inbuf = join('', map { chr($_) } @data); # Input buffer printf("\$inbuf: >%s<\n", $inbuf); printf("\n"); foreach my $fmt (@formats) { my $tmpbuf = $inbuf; # In case something's being corrupted my ($event_len, $tail_len); my ($year, $mon, $day); my ($hour, $min, $sec); my $rec_type_idx; my $source_name; my @data; ($event_len, $year, $mon, $day, $hour, $min, $sec, $rec_type_idx, $source_name, @data[0..32], $tail_len) = unpack( +$fmt, $tmp buf); my $date_str = sprintf("%s/%s/%s", $day, $mon, $year+1900); my $time_str = sprintf("%02d:%02d:%02d", $hour, $min, $sec); $rec_type_idx &= 0x1F; # Only use b0-b6 printf(" Format: %s\n", $fmt); printf(" \$event_len: %02Xx = %d\n", $event_len, $event_len); printf(" \$date_str: %s\n", $date_str); printf(" \$time_str: %s\n", $time_str); printf("\$rec_type_idx: %02Xx = %d\n", $rec_type_idx, $rec_type_idx +); printf(" \$source_name: !%s!\n", $source_name); printf(" \$data: !%s!\n", join(' ', @data)); printf(" \$tail_len: %02Xx = %d\n", $tail_len, $tail_len); printf("\n"); } C:\Users\Bill\forums\monks>perl ozboomer.pl $inbuf: >.u ♦ ►WV2 ĮI=427 Lamp dim state=2 [SCATS=0].< Format: C8a6C33C1 $event_len: 2Ex = 46 $date_str: 4/10/2017 $time_str: 00:00:00 $rec_type_idx: 10x = 16 $source_name: !WV2 ! $data: !128 73 61 52 50 55 32 76 97 109 112 32 100 105 109 32 +115 116 97 116 101 61 50 32 91 83 67 65 84 83 61 48 93! $tail_len: 2Ex = 46 Format: C8A6(A)33C1 $event_len: 2Ex = 46 $date_str: 4/10/2017 $time_str: 00:00:00 $rec_type_idx: 10x = 16 $source_name: !WV2! $data: !Į I = 4 2 7 L a m p d i m s t a t e = 2 [ S C A T +S = 0 ]! $tail_len: 2Ex = 46 [download] I recommend that you do return $data as a string (use a33). Yes, you do need additional processing, but string processing is what perl does best. Bill	[reply] [d/l]
Re: Problems Getting the Template Correct when Using unpack() by ozboomer (Friar) on Oct 11, 2017 at 23:36 UTC
Your specification of C33 returns a list of 33 unsigned numbers. You need an array to hold them. Another option is to use the array, but fill it with a list of characters ( use (a)33 ). Ahhh..! Tha's the biggest clue, methinks... and something I was (half) mindful of - treating the list of bytes as a string when it really should be an array. So, the way I'm looking to proceed now is along the following lines (code follows):- Read more... (3 kB) Again, I'm very appreciative of everyone's help in helping me get around this annoyingly 'simple' problem... :)	[reply] [d/l]


Your skill will accomplish what the force of many cannot
	PerlMonks