Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

How much was unpack()ed?

by Stevie-O (Friar)
on May 21, 2004 at 14:28 UTC ( #355284=perlquestion: print w/replies, xml ) Need Help??
Stevie-O has asked for the wisdom of the Perl Monks concerning the following question:

I thought this was a simple problem, but apparently it's tougher than I first thought.

I need to unpack() a pile of data that's in a flexible format. This data has a lot of metadata, e.g. length-of-string, type-of-next-value, etc.. One of the problems is that I don't often know *what* I'm unpacking until I just before need to unpack it.

For example, let's say one of the values was a 'type' byte. 0=byte, 1=short, 2=long, 3=ASCIIZ. I need to do this:

$t = unpack('C', $foo); if ($t == 0) { $v = unpack('C', substr($foo, 1)) } elsif ($t == 1) { $v = unpack('S', substr($foo, 1)) } elsif ($t == 2) { $v = unpack('L', substr($foo, 1)) } elsif ($t == 3) { $v = unpack('Z*', substr($foo, 1)) }
Now, that's not so hard. The problem arises when I have to fetch the NEXT value. For the first three cases it's easy (fixed lengths) but the third is variable-length. How to easily find out the place where unpack() finished, so I can pick up where it left off?

Some might offer 'well, use length($v) + 1 for Z*'. Well, that works for Z*. But what about things like $v = unpack('w', $foo)? 'w' is a variable-length encoded integer value. How do I know whether it was encoded with 1, 2, 3, or 4 bytes? Better yet, try several things - unpack('(wZ*w)5)'

$"=$,,$_=q>|\p4<6 8p<M/_|<('=> .q>.<4-KI<l|2$<6%s!<qn#F<>;$, .=pack'N*',"@{[unpack'C*',$_] }"for split/</;$_=$,,y[A-Z a-z] {}cd;print lc

Replies are listed 'Best First'.
Re: How much was unpack()ed?
by NetWallah (Canon) on May 21, 2004 at 16:17 UTC
    This problem has 2 aspects:
    • program structure
    • Unpacking variable data
    Take a look at how the Netpacket::* modules reduce programming complexity - they aaddress the same 2 issues.

    Offense, like beauty, is in the eye of the beholder, and a fantasy.
    By guaranteeing freedom of expression, the First Amendment also guarntees offense.
Re: How much was unpack()ed?
by meredith (Friar) on May 21, 2004 at 18:52 UTC

    Slightly off-topic: Your code block here:

    $t = unpack('C', $foo); if ($t == 0) { $v = unpack('C', substr($foo, 1)) } elsif ($t == 1) { $v = unpack('S', substr($foo, 1)) } elsif ($t == 2) { $v = unpack('L', substr($foo, 1)) } elsif ($t == 3) { $v = unpack('Z*', substr($foo, 1)) }

    could also be this:

    my @unpack_types = qw( C S L Z* ); $t = unpack('C', $foo); $v = unpack($unpack_types[$t], substr($foo, 1)); #Is $t always a num +ber?

    You would probably do the same thing later on, when you take another look at this code :) I know it's not much savings for only four types, but I don't know what you're working on -- you could have 12! :)

    mhoward - at -
Re: How much was unpack()ed?
by ambrus (Abbot) on May 21, 2004 at 16:11 UTC

    The problem here seems to be that unpack does not have a formatter similar to %n of scanf. You might try to use a regexp instead for matching a zero-terminated string, while still using pack for the other types. Then, you can use pos or $+[0] to find out how much you've read.

Re: How much was unpack()ed?
by Anonymous Monk on May 21, 2004 at 17:09 UTC
    To "easily find out the place where unpack() finished" you could simply consume the string as you go:
    (my($t),$foo) = unpack('Ca*', $foo); if ( $t == 3 ) { ($v,$foo) = unpack('Z*a*', $foo); }
    Unforunately Z* doesn't finish at the zero byte - it actually consumes the remainder of the input and discards everything beyond the zero.

      If the fields are prefixed with length bytes, then you can prevent 'a*' (or 'A*' or 'Z*') from consuming the rest of the string by telling format that the length byte is there.

      $x = pack 'c/a*N', '12345123451234512345', 99999; print for unpack 'c/a* N', $x; 12345123451234512345 99999

      The downside is that the length byte is then consumed. To workaround that, you have to unpack the length byte twice.

      print for unpack 'cXc/a* N', $x; 20 12345123451234512345 99999

      However, I don't see how this helps the OP as his data has a 'type' byte but no 'length' byte.

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail

      What? It seems to me that Z* does stop at the zero byte, it's only Z with a finite number that does not stop. Look.

      $ perl -we '($a,$b)= unpack "Z*a*", "nine\0eight"; warn ">>$a<< >>$b<< +";' >>nine<< >>eight<< at -e line 1. $ perl -we '($a,$b)= unpack "Z8a*", "nine\0eight"; warn ">>$a<< >>$b<< +";' >>nine<< >>ht<< at -e line 1.

      This was with perl, v5.8.1 built for i686-linux.

      Update: just checked, with perl 5.6.1, I get the wrong behaviour, that is, Z* consumes all the string.

Re: How much was unpack()ed? (stream)
by tye (Sage) on May 22, 2004 at 04:34 UTC

    Whenever I do much with pack/unpack, I'm reminded that these really need to support stream operations.

    unpack needs to be able to read from a stream or, more likely, to start reading at pos($input) and to set pos($input) to note where it left off (in other words, treat the input string as a stream).

    You can already do print STREAM pack ... so pack doesn't really need to be able to write to a stream. But it'd be cool if pack supported starting at pos($string) and overwriting as many bytes after that as needed and setting pos($string) to where it left off at.

    I even started writing a module to implement such. But I didn't finish and now pack/unpack have gotten quite a bit fancier such that this either needs to be patched directly into pack/unpack or (at least) they need some introspection features added to make implementing these in a module reasonable/possible.

    For example "." could be the format for "current seek offset" which you could use like:

    my( $z, $zEnd, $i, $iEnd )= unpack "z.I.", $buf; # $zEnd == length($z) # $iEnd == 4 + $zEnd my( $z, $i )= getData(); my $buf= pack "z.I.", $z, my $zEnd, $i, my $iEnd; # $zEnd == length($z) # $iEnd == 4 + $zEnd

    - tye        

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://355284]
Approved by bmcatt
Front-paged by NetWallah
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2018-02-20 22:27 GMT
Find Nodes?
    Voting Booth?
    When it is dark outside I am happiest to see ...

    Results (274 votes). Check out past polls.