Stevie-O has asked for the wisdom of the Perl Monks concerning the following question:
I thought this was a simple problem, but apparently it's tougher than I first thought.
I need to unpack() a pile of data that's in a flexible format. This data has a lot of metadata, e.g. length-of-string, type-of-next-value, etc.. One of the problems is that I don't often know *what* I'm unpacking until I just before need to unpack it.
For example, let's say one of the values was a 'type' byte. 0=byte, 1=short, 2=long, 3=ASCIIZ.
I need to do this:
$t = unpack('C', $foo);
if ($t == 0) { $v = unpack('C', substr($foo, 1)) }
elsif ($t == 1) { $v = unpack('S', substr($foo, 1)) }
elsif ($t == 2) { $v = unpack('L', substr($foo, 1)) }
elsif ($t == 3) { $v = unpack('Z*', substr($foo, 1)) }
Now, that's not so hard. The problem arises when I have to fetch the NEXT value. For the first three cases it's easy (fixed lengths) but the third is variable-length. How to easily find out the place where unpack() finished, so I can pick up where it left off?
Some might offer 'well, use length($v) + 1 for Z*'. Well, that works for Z*. But what about things like
$v = unpack('w', $foo)? 'w' is a variable-length encoded integer value. How do I know whether it was encoded with 1, 2, 3, or 4 bytes? Better yet, try several things - unpack('(wZ*w)5)'
--Stevie-O
$"=$,,$_=q>|\p4<6 8p<M/_|<('=>
.q>.<4-KI<l|2$<6%s!<qn#F<>;$,
.=pack'N*',"@{[unpack'C*',$_]
}"for split/</;$_=$,,y[A-Z a-z]
{}cd;print lc
Re: How much was unpack()ed?
by NetWallah (Canon) on May 21, 2004 at 16:17 UTC
|
This problem has 2 aspects:
- program structure
-
Unpacking variable data
Take a look at how the Netpacket::* modules reduce programming complexity - they aaddress the same 2 issues.
Offense, like beauty, is in the eye of the beholder, and a fantasy.
By guaranteeing freedom of expression, the First Amendment also guarntees offense.
| [reply] |
Re: How much was unpack()ed?
by meredith (Friar) on May 21, 2004 at 18:52 UTC
|
$t = unpack('C', $foo);
if ($t == 0) { $v = unpack('C', substr($foo, 1)) }
elsif ($t == 1) { $v = unpack('S', substr($foo, 1)) }
elsif ($t == 2) { $v = unpack('L', substr($foo, 1)) }
elsif ($t == 3) { $v = unpack('Z*', substr($foo, 1)) }
could also be this: my @unpack_types = qw( C S L Z* );
$t = unpack('C', $foo);
$v = unpack($unpack_types[$t], substr($foo, 1)); #Is $t always a num
+ber?
You would probably do the same thing later on, when you take another look at this code :) I know it's not much savings for only four types, but I don't know what you're working on -- you could have 12! :)
mhoward - at - hattmoward.org
| [reply] [d/l] [select] |
Re: How much was unpack()ed?
by ambrus (Abbot) on May 21, 2004 at 16:11 UTC
|
The problem here seems to be that unpack does not have
a formatter similar to %n of scanf. You might try
to use a regexp instead for matching a zero-terminated string,
while still using pack for the other types. Then, you can
use pos or $+[0] to find out how much you've read.
| [reply] [d/l] [select] |
Re: How much was unpack()ed?
by Anonymous Monk on May 21, 2004 at 17:09 UTC
|
To "easily find out the place where unpack() finished" you could simply consume the string as you go:
(my($t),$foo) = unpack('Ca*', $foo);
if ( $t == 3 ) {
($v,$foo) = unpack('Z*a*', $foo);
}
Unforunately Z* doesn't finish at the zero byte - it actually consumes the remainder of the input and discards everything beyond the zero. | [reply] [d/l] [select] |
|
$x = pack 'c/a*N', '12345123451234512345', 99999;
print for unpack 'c/a* N', $x;
12345123451234512345
99999
The downside is that the length byte is then consumed. To workaround that, you have to unpack the length byte twice.
print for unpack 'cXc/a* N', $x;
20
12345123451234512345
99999
However, I don't see how this helps the OP as his data has a 'type' byte but no 'length' byte.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
| [reply] [d/l] [select] |
|
$ perl -we '($a,$b)= unpack "Z*a*", "nine\0eight"; warn ">>$a<< >>$b<<
+";'
>>nine<< >>eight<< at -e line 1.
$ perl -we '($a,$b)= unpack "Z8a*", "nine\0eight"; warn ">>$a<< >>$b<<
+";'
>>nine<< >>ht<< at -e line 1.
This was with perl, v5.8.1 built for i686-linux.
Update: just checked, with perl 5.6.1, I get the wrong behaviour, that is,
Z* consumes all the string. | [reply] [d/l] |
Re: How much was unpack()ed? (stream)
by tye (Sage) on May 22, 2004 at 04:34 UTC
|
Whenever I do much with pack/unpack, I'm reminded that these really need to support stream operations.
unpack needs to be able to read from a stream or, more likely, to start reading at pos($input) and to set pos($input) to note where it left off (in other words, treat the input string as a stream).
You can already do print STREAM pack ... so pack doesn't really need to be able to write to a stream. But it'd be cool if pack supported starting at pos($string) and overwriting as many bytes after that as needed and setting pos($string) to where it left off at.
I even started writing a module to implement such. But I didn't finish and now pack/unpack have gotten quite a bit fancier such that this either needs to be patched directly into pack/unpack or (at least) they need some introspection features added to make implementing these in a module reasonable/possible.
For example "." could be the format for "current seek offset" which you could use like:
my( $z, $zEnd, $i, $iEnd )= unpack "z.I.", $buf;
# $zEnd == length($z)
# $iEnd == 4 + $zEnd
my( $z, $i )= getData();
my $buf= pack "z.I.", $z, my $zEnd, $i, my $iEnd;
# $zEnd == length($z)
# $iEnd == 4 + $zEnd
| [reply] [d/l] [select] |
Re: How much was unpack()ed?
by rikkuru (Initiate) on Apr 01, 2020 at 11:30 UTC
|
($t, $foo) = unpack('Ca*', $foo);
if ($t == 0) {
($v, $foo) = unpack('Ca*', $foo );
}
| [reply] [d/l] |
|
|