Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Unpack()ing a stream

by Stevie-O (Friar)
on May 24, 2004 at 14:29 UTC ( #355921=perlquestion: print w/replies, xml ) Need Help??
Stevie-O has asked for the wisdom of the Perl Monks concerning the following question:

Some (at least 40) of you may remember How much was unpack()ed?. Well, I came up with a solution and wrote a module that creates a streamable unpack(). Furthermore, it makes virtually no assumptions about how unpack() works, which means it should be compatible with future syntaxes! :)

Here's a module that implements it. Please suggest names appropriate for CPAN!

package StreamString; use strict; use warnings; =pod use StreamString; $str = new StreamString('foo bar baz'); $word = $str->unpack('A4'); print "First word is: $word\n"; # First word is: foo $word = $str->unpack('A4'); print "Second word is: $word\n"; # Second word is: bar # Skip forward a byte. $str++; # also $str->seek(1, StreamString::SEEK_CUR) $word = $str->unpack('A4'); print "Third word is: $word\n"; # Third word is: az $str -= 3; $word = $str->unpack('A4'); print "No, wait, it was $word\n"; # No, wait it was $strpos = 0+$str; # god bless overload() $strpos = $str->tell; # if you don't like the previous syntax $bar = $str->unpack_at(4, 'A4'); # seek to 4 before unpacking $bar2 = $str->unpack('@4 A4'); # this works too =cut # not gonna pull in all of Fcntl for three constants use constant { SEEK_SET => 0, SEEK_CUR => 1, SEEK_END => 2 }; use overload fallback => 1, '0+' => 'tell', # calls 'tell' method '+' => sub { $_[0]->tell() + $_[1] }, '=' => 'copy', #sub{$_[0]->new($_[0], pos(${$_[0]} +))}, '+=' => sub { $_[0]->seek($_[1], SEEK_CUR); $_[0] }, + # += and -= are just relative seeks '-=' => sub { $_[0]->seek(-$_[1], SEEK_CUR); $_[0] }, + # += and -= are just relative seeks '""' => sub { ${+shift} }, # stringify 'bool' => sub { ! $_[0]->eof } , # false == end-of- +string; truth == not ; use Carp qw(croak); sub copy : method { # print "copy constructor called\n"; my $this = shift; $this->new($$this, pos($$this)); } sub new : method { my $class = shift; $class = ref($class) if ref($class); my $text = shift || ''; pos($text) = shift || 0; bless \$text, $class; } sub eof : method { my $this = shift; pos($$this) >= length($this) } sub tell : method { pos(${+shift}) } sub seek : method { croak "Not enough arguments to StreamString::seek()" unless @_ >= +3; my ($this, $where, $how) = @_; my $newpos = $how == SEEK_SET ? $where : $how == SEEK_CUR ? $where + pos($$this) : $how == SEEK_END ? length($$this) - $where : croak("Invalid WHENCE specified for StreamString::Se +ek ($how)!"); return pos($$this) = $newpos if $newpos>=0 && $newpos < length($$t +his); # range check return pos($$this); } sub unpack : method { my ($this, $str) = @_; my $p = pos($$this); # this is grossly inefficient. It requires making a (potentially +very large) # copy of the not-yet-unpacked version of the string. If anyone c +an tell me # how to make unpack return the number of bytes it's run through, +OR how to # work the internal APIs behind unpack (unpack_str/unpackstring) t +o make them # do it, please tell me! my @list = CORE::unpack("\@$p $str a*", $$this); # it's very simple. The last thing I return is (a*), which sucks +up the # whole remainder of the string. I figure out how long that is, a +nd from # there I can find out where unpack() stopped reading the data we +actually # wanted. my $remainder = pop @list; pos($$this) = length($$this) - length $remainder; return wantarray ? @list : $list[0]; } sub unpack_at : method { my ($this, $start, $str) = @_; $this->seek($start, SEEK_SET); return $this->unpack($str); } sub main::StreamStringTest { my ($str, $word, $strpos, $bar, $bar2); $str = new StreamString('foo bar baz'); $word = $str->unpack('A4'); print "First word is: '$word'\n"; # First word is: foo $word = $str->unpack('A4'); print "Second word is: '$word'\n"; # Second word is: bar # Skip forward a byte. $str++; $word = $str->unpack('A4'); print "Third word is: '$word'\n"; # Third word is: az $str -= 3; $word = $str->unpack('A4'); print "No, wait, it was '$word'\n"; # No, wait it was baz $strpos = 0+$str; # god bless overload() print "strpos is $strpos\n"; $strpos = $str->tell; # if you don't like the previous syntax print "strpos is $strpos [2]\n"; $bar = $str->unpack_at(4, 'A4'); # seek to 4 before unpacking print "bar is '$bar'\n"; $bar2 = $str->unpack('@4 A4'); # this works too print "bar2 is '$bar2'\n"; } 1;
An interesting thing I discovered when writing this is that the copy constructor is sometimes called unnecessarily (i.e. when only one reference to an object exists and so no copy needs to be made).
$"=$,,$_=q>|\p4<6 8p<M/_|<('=> .q>.<4-KI<l|2$<6%s!<qn#F<>;$, .=pack'N*',"@{[unpack'C*',$_] }"for split/</;$_=$,,y[A-Z a-z] {}cd;print lc

Replies are listed 'Best First'.
Re: Unpack()ing a stream
by hardburn (Abbot) on May 24, 2004 at 14:33 UTC

    I'd put it in Parse::UnpackStream. I'm not sure that the Parse:: namespace is the best place, but I don't know of any better place for it. It certainly shouldn't be in the root namespace.

      Yeah, I know about the root namespace thing. All of my modules are in the root namespace while I write them, because then I don't need to concern myself with directories. If they work out and I decide they're CPAN-worthy, I then use h2xs to construct the final module directory and transplant the code.
      $"=$,,$_=q>|\p4<6 8p<M/_|<('=> .q>.<4-KI<l|2$<6%s!<qn#F<>;$, .=pack'N*',"@{[unpack'C*',$_] }"for split/</;$_=$,,y[A-Z a-z] {}cd;print lc
Re: Unpack()ing a stream
by diotalevi (Canon) on May 24, 2004 at 16:36 UTC

    Ok, that's more code than I can digest right now. Does it play well with formats like '(N/a*)*'?

    Oops. I originally wrote that as "(a*/N)*" which is wrong - the N comes before the a*. pack will tell you that if you get it wrong.

      It should work with any input unpack() itself normally works with. It's a simple (and inefficient) technique: It appends 'a*' to the end of the unpack string and pops off the last element returned. This will be the remainder of the string. A simple subtraction (length($fullstring) - length($remainder)) will reveal the point at which unpack() stopped unpacking the data the user actually asked for.
      $"=$,,$_=q>|\p4<6 8p<M/_|<('=> .q>.<4-KI<l|2$<6%s!<qn#F<>;$, .=pack'N*',"@{[unpack'C*',$_] }"for split/</;$_=$,,y[A-Z a-z] {}cd;print lc

      What on earth does '(a*/N)*' do? unpack seems to be trying to use it, but I can't work out what kind of input to supply?

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail

        When given to pack() it consumes a list of strings and returns a single string where every string is now prefixed with the length, encoded in a big-endian 32 bit value. Its fantastic for serializing a bunch of strings. Do note - I wrote the format backwards so the "N" comes before the "A*". pack will tell you this as well so you can't make the mistake and let it just get away from you.

        pack '(N/a*)*', list of strings; $foo = pack '(N/a*)*', "A", "BB", "CCC", "DDDD"; $foo eq ( "\0\0\0\1A" . "\0\0\0\2BB" . "\0\0\0\3CCC" . "\0\0\0\4DDDD" +);

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://355921]
Approved by bassplayer
Front-paged by bmcatt
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2017-03-29 23:07 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (353 votes). Check out past polls.