PerlJedi has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks,
I have a binary file with the headder content (in HEX) as below (one single line).
0001000000004f914f1c0b010042534341303000000005000043413030303030000000
+014341303030313400000002434130303032320000000343413030303234000000044
+34130303032390000000100224e43656c6c3000000002004253434130310000000500
+054341303130303100000006434130313030340000000743413031303130000000084
+341303130313500000009434130313031360000000100224e43656c6c300000000300
+42534341303200000006000a
This data should be unpacked to populate the following structure.
Header Record Id : 1 Byte
File Format Version : 1 Byte
Timestamp : 8 Bytes
No. of BSCs : 1 Byte
For each BSC ...
BSC Id : 1 Byte
Application Version : 1 Byte
BSC Name : 2 Bytes
Number of Cells : 2 Bytes
For each Cell ...
Cell Pointer : 2 Bytes
Cell Name : 9 Bytes
Number of Neighbour Cells : 2 Bytes
For each Neighbour Cell to this BSC ...
Cell Pointer : 2 Bytes
Cell Name : 9 Bytes
Here is what I have been trying...
#! /usr/bin/perl -w
open(FILE, "<A20120420.1257+0100-1302+0100_group0.bin");
binmode(FILE);
my $headder = <FILE>;
($headderRecordId, $fileFormatId, $timeStamp, $noOfBSCs, $specOfBSCs)
+= unpack ("H2 H2 H16 H2 H*", $headder);
No doubt, I can decode the binary file this way. But here is what I prefer:
1. I need to convert the $noOfBSCs to Decimal, and then use it in a loop and then decode the data for the rest of the BSCs. Same would be followed for the Cells also.
> Is there a way that I can work without converting the read hex value in loops?
2. I do see that there is dynamic unpack possible also. But then I could not understand.
Since performance is a criteria, I dont want to code with C kind of logic. Please let me know as to how it can be done better in perl.
Thanks in advance,
PerlJedi
Re: Handling Hex data with Dynamic unpack
by Marshall (Canon) on Jul 05, 2012 at 11:16 UTC
|
By printing this out in ASCII hex, you have actually changed the problem.
When you do a binmode read to a scalar, you get a $buffer where each "character" is a byte (0-255 unsigned). For actual bytes there is "nothing to be done" - it is already a byte value. For multi-byte fields some sort of unpack() is usually necessary. Use substr() to get a range of byte values. Use unpack() to convert sequences of bytes into some other representation (from little endian to big endian or whatever).
my $HeaderRecordId = substr($buf,0,1);
my $FileFormatVersion = substr($buf,1,1);
my $TimeStamp = substr($buf,2,8); #some kind of unpack needed here!
my $NumBSC = substr($buf,10,1);
$HeaderRecordId $FileFormatVersion, $NumBSC are just bytes and nothing more is needed past substr().
Update: I looked back an some ancient code (I don't deal with binary very often), but this had to do with .WAV files. My point is that substr() will get you the sequence of bytes. Here, I look for "RIFF" and "data" with string compares. The V4 unpack is for little endian conversion.
code snippet...
read(IN, my $buff, 1 * 2**10);
(substr($buff,0,4) eq "RIFF") || die "not a valid RIFF file";
my $size = unpack ("V4",substr($buff,4,4));
myprint (" RIFF segment size = $size");
(substr($buff,50,4) eq "data")|| die "data segment not found";
my $dsize = unpack ("V4",substr($buff,54,4));
myprint (" DATA Segment size = $dsize");
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
| [reply] [Watch: Dir/Any] |
|
binmode does not work on a scalar, only a file handle. substr can be either in utf8 mode or byte mode, and substr doesn't guarantee the returning scalar is in one or the other mode.
| [reply] [Watch: Dir/Any] |
Re: Handling Hex data with Dynamic unpack
by Anonymous Monk on Jul 05, 2012 at 09:59 UTC
|
No doubt, I can decode the binary file this way.
I don't know many binary files represented as hex-text, so that won't work
Also, the format specification doesn't specify endianess so the format spec seems incomplete
I dont want to code with C kind of logic :) But but but but, aren't you dealing with C-kind of data?
Maybe Convert::Binary::C can help?
| [reply] [Watch: Dir/Any] |
|
It is a BCD encoded binay file which is read in binary() mode.
It follows a Big Endian notation.
By C kind of logic, I meant the usual way of using loops etc.
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
|
|
|
It is a BCD encoded binay file which is read in binary() mode.
What does that mean?
By C kind of logic, I meant the usual way of using loops etc. I think first you have to write some loops :)
| [reply] [Watch: Dir/Any] |
|
|
|
|
Re: Handling Hex data with Dynamic unpack
by Anonymous Monk on Jul 05, 2012 at 11:06 UTC
|
This is how I would start, but I'd need more complete spec
#!/usr/bin/perl --
use strict; use warnings;
use Data::Dump;
my $rawAsHex = q'0001000000004f914f1c0b0100425343413030000000050000434
+130303030300000000143413030303134000000024341303030323200000003434130
+3030323400000004434130303032390000000100224e43656c6c30000000020042534
+341303100000005000543413031303031000000064341303130303400000007434130
+31303130000000084341303130313500000009434130313031360000000100224e436
+56c6c30000000030042534341303200000006000a';
my $raw = pack 'H*', $rawAsHex;
my @stack = unpack q{
A2 ### Header Record Id : 1 Byte
A2 ### File Format Version : 1 Byte
A16 ### Timestamp : 8 Bytes
A2 ### No. of BSCs : 1 Byte
### For each BSC ...
A2 ### BSC Id : 1 Byte
A2 ### Application Version : 1 Byte
A4 ### BSC Name : 2 Bytes
A4 ### Number of Cells : 2 Bytes
### For each Cell ...
A4 ### Cell Pointer : 2 Bytes
A18 ### Cell Name : 9 Bytes
A4 ### Number of Neighbour Cells : 2 Bytes
### For each Neighbour Cell to this BSC ...
A4 ### Cell Pointer : 2 Bytes
A18 ### Cell Name : 9 Bytes
}, $rawAsHex;
my $id = unpack 'C*', pack 'H*', $stack[0];
my $version = unpack 'C*', pack 'H*', $stack[1];
my $time = unpack 'H*', pack 'H*', $stack[2]; ## WHAT?!
dd [ $id, $version, $time , \@stack ];
__END__
[
0,
1,
"000000004f914f1c",
[
"00",
"01",
"000000004f914f1c",
"0b",
"01",
"00",
4253,
4341,
3030,
"000000050000434130",
3030,
3030,
"000000014341303030",
],
]
| [reply] [Watch: Dir/Any] [d/l] |
|
| [reply] [Watch: Dir/Any] |
|
Some how perl doesn't allow me to unpack like this (look at H2 H2 H16 H2 A2 ...) :
my @stack = unpack q{
H2 ### Header Record Id : 1 Byte
H2 ### File Format Version : 1 Byte
H16 ### Timestamp : 8 Bytes
H2 ### No. of BSCs : 1 Byte
### For each BSC ...
A2 ### BSC Id : 1 Byte
A2 ### Application Version : 1 Byte
A4 ### BSC Name : 2 Bytes
A4 ### Number of Cells : 2 Bytes
### For each Cell ...
A4 ### Cell Pointer : 2 Bytes
A18 ### Cell Name : 9 Bytes
A4 ### Number of Neighbour Cells : 2 Bytes
### For each Neighbour Cell to this BSC ...
A4 ### Cell Pointer : 2 Bytes
A18 ### Cell Name : 9 Bytes
}, $rawAsHex;
Any idea how this could be done?
I get a output like this :
[
48,
48,
"3031303030303030",
[
30,
30,
"3031303030303030",
30,
"04",
"f9",
"14f1",
"c0b0",
1004,
"253434130300000000",
5000,
"0434",
"130303030300000000",
],
]
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Some how perl doesn't allow me to unpack like this (look at H2 H2 H16 H2 A2 ...) ... Any idea how this could be done? I showed you how. What you're dealing is a text-strings, so if you want bytes, you have to pack them. First pack them to get bytes ( pack 'H*' ) then pack them to get what you're really after ( C An unsigned char (octet) value. ) .....
Commands
perl -le " print unpack q{H*}, q{Y} "
perl -le " print pack q{H*}, q{59} "
perl -le " print ord q{Y}
perl -le " print pack q{H*}, q{59} "
perl -le " print unpack q{C}, pack q{H*}, q{59} "
Session $ perl -le " print unpack q{H*}, q{Y} "
59
$ perl -le " print pack q{H*}, q{59} "
Y
$ perl -le " print ord q{Y}
89
$ perl -le " print pack q{H*}, q{59} "
Y
$ perl -le " print unpack q{C}, pack q{H*}, q{59} "
89
Y encoded as hex is 59
The numeric value ( ord ) of Y is 89
The C An unsigned char (octet) value, 8-bits, 1-byte of Y is 89 | [reply] [Watch: Dir/Any] [d/l] [select] |
|
|
|