Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Unpack()ing a stream

by Stevie-O (Friar)
on May 24, 2004 at 14:29 UTC ( #355921=perlquestion: print w/ replies, xml ) Need Help??
Stevie-O has asked for the wisdom of the Perl Monks concerning the following question:

Some (at least 40) of you may remember How much was unpack()ed?. Well, I came up with a solution and wrote a module that creates a streamable unpack(). Furthermore, it makes virtually no assumptions about how unpack() works, which means it should be compatible with future syntaxes! :)

Here's a module that implements it. Please suggest names appropriate for CPAN!

package StreamString; use strict; use warnings; =pod use StreamString; $str = new StreamString('foo bar baz'); $word = $str->unpack('A4'); print "First word is: $word\n"; # First word is: foo $word = $str->unpack('A4'); print "Second word is: $word\n"; # Second word is: bar # Skip forward a byte. $str++; # also $str->seek(1, StreamString::SEEK_CUR) $word = $str->unpack('A4'); print "Third word is: $word\n"; # Third word is: az $str -= 3; $word = $str->unpack('A4'); print "No, wait, it was $word\n"; # No, wait it was $strpos = 0+$str; # god bless overload() $strpos = $str->tell; # if you don't like the previous syntax $bar = $str->unpack_at(4, 'A4'); # seek to 4 before unpacking $bar2 = $str->unpack('@4 A4'); # this works too =cut # not gonna pull in all of Fcntl for three constants use constant { SEEK_SET => 0, SEEK_CUR => 1, SEEK_END => 2 }; use overload fallback => 1, '0+' => 'tell', # calls 'tell' method '+' => sub { $_[0]->tell() + $_[1] }, '=' => 'copy', #sub{$_[0]->new($_[0], pos(${$_[0]} +))}, '+=' => sub { $_[0]->seek($_[1], SEEK_CUR); $_[0] }, + # += and -= are just relative seeks '-=' => sub { $_[0]->seek(-$_[1], SEEK_CUR); $_[0] }, + # += and -= are just relative seeks '""' => sub { ${+shift} }, # stringify 'bool' => sub { ! $_[0]->eof } , # false == end-of- +string; truth == not ; use Carp qw(croak); sub copy : method { # print "copy constructor called\n"; my $this = shift; $this->new($$this, pos($$this)); } sub new : method { my $class = shift; $class = ref($class) if ref($class); my $text = shift || ''; pos($text) = shift || 0; bless \$text, $class; } sub eof : method { my $this = shift; pos($$this) >= length($this) } sub tell : method { pos(${+shift}) } sub seek : method { croak "Not enough arguments to StreamString::seek()" unless @_ >= +3; my ($this, $where, $how) = @_; my $newpos = $how == SEEK_SET ? $where : $how == SEEK_CUR ? $where + pos($$this) : $how == SEEK_END ? length($$this) - $where : croak("Invalid WHENCE specified for StreamString::Se +ek ($how)!"); return pos($$this) = $newpos if $newpos>=0 && $newpos < length($$t +his); # range check return pos($$this); } sub unpack : method { my ($this, $str) = @_; my $p = pos($$this); # this is grossly inefficient. It requires making a (potentially +very large) # copy of the not-yet-unpacked version of the string. If anyone c +an tell me # how to make unpack return the number of bytes it's run through, +OR how to # work the internal APIs behind unpack (unpack_str/unpackstring) t +o make them # do it, please tell me! my @list = CORE::unpack("\@$p $str a*", $$this); # it's very simple. The last thing I return is (a*), which sucks +up the # whole remainder of the string. I figure out how long that is, a +nd from # there I can find out where unpack() stopped reading the data we +actually # wanted. my $remainder = pop @list; pos($$this) = length($$this) - length $remainder; return wantarray ? @list : $list[0]; } sub unpack_at : method { my ($this, $start, $str) = @_; $this->seek($start, SEEK_SET); return $this->unpack($str); } sub main::StreamStringTest { my ($str, $word, $strpos, $bar, $bar2); $str = new StreamString('foo bar baz'); $word = $str->unpack('A4'); print "First word is: '$word'\n"; # First word is: foo $word = $str->unpack('A4'); print "Second word is: '$word'\n"; # Second word is: bar # Skip forward a byte. $str++; $word = $str->unpack('A4'); print "Third word is: '$word'\n"; # Third word is: az $str -= 3; $word = $str->unpack('A4'); print "No, wait, it was '$word'\n"; # No, wait it was baz $strpos = 0+$str; # god bless overload() print "strpos is $strpos\n"; $strpos = $str->tell; # if you don't like the previous syntax print "strpos is $strpos [2]\n"; $bar = $str->unpack_at(4, 'A4'); # seek to 4 before unpacking print "bar is '$bar'\n"; $bar2 = $str->unpack('@4 A4'); # this works too print "bar2 is '$bar2'\n"; } 1;
An interesting thing I discovered when writing this is that the copy constructor is sometimes called unnecessarily (i.e. when only one reference to an object exists and so no copy needs to be made).
--Stevie-O
$"=$,,$_=q>|\p4<6 8p<M/_|<('=> .q>.<4-KI<l|2$<6%s!<qn#F<>;$, .=pack'N*',"@{[unpack'C*',$_] }"for split/</;$_=$,,y[A-Z a-z] {}cd;print lc

Comment on Unpack()ing a stream
Select or Download Code
Re: Unpack()ing a stream
by hardburn (Abbot) on May 24, 2004 at 14:33 UTC

    I'd put it in Parse::UnpackStream. I'm not sure that the Parse:: namespace is the best place, but I don't know of any better place for it. It certainly shouldn't be in the root namespace.

      Yeah, I know about the root namespace thing. All of my modules are in the root namespace while I write them, because then I don't need to concern myself with directories. If they work out and I decide they're CPAN-worthy, I then use h2xs to construct the final module directory and transplant the code.
      --Stevie-O
      $"=$,,$_=q>|\p4<6 8p<M/_|<('=> .q>.<4-KI<l|2$<6%s!<qn#F<>;$, .=pack'N*',"@{[unpack'C*',$_] }"for split/</;$_=$,,y[A-Z a-z] {}cd;print lc
Re: Unpack()ing a stream
by diotalevi (Canon) on May 24, 2004 at 16:36 UTC

    Ok, that's more code than I can digest right now. Does it play well with formats like '(N/a*)*'?


    Oops. I originally wrote that as "(a*/N)*" which is wrong - the N comes before the a*. pack will tell you that if you get it wrong.

      What on earth does '(a*/N)*' do? unpack seems to be trying to use it, but I can't work out what kind of input to supply?


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail

        When given to pack() it consumes a list of strings and returns a single string where every string is now prefixed with the length, encoded in a big-endian 32 bit value. Its fantastic for serializing a bunch of strings. Do note - I wrote the format backwards so the "N" comes before the "A*". pack will tell you this as well so you can't make the mistake and let it just get away from you.

        pack '(N/a*)*', list of strings; $foo = pack '(N/a*)*', "A", "BB", "CCC", "DDDD"; $foo eq ( "\0\0\0\1A" . "\0\0\0\2BB" . "\0\0\0\3CCC" . "\0\0\0\4DDDD" +);
      It should work with any input unpack() itself normally works with. It's a simple (and inefficient) technique: It appends 'a*' to the end of the unpack string and pops off the last element returned. This will be the remainder of the string. A simple subtraction (length($fullstring) - length($remainder)) will reveal the point at which unpack() stopped unpacking the data the user actually asked for.
      --Stevie-O
      $"=$,,$_=q>|\p4<6 8p<M/_|<('=> .q>.<4-KI<l|2$<6%s!<qn#F<>;$, .=pack'N*',"@{[unpack'C*',$_] }"for split/</;$_=$,,y[A-Z a-z] {}cd;print lc

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://355921]
Approved by bassplayer
Front-paged by bmcatt
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (18)
As of 2015-07-01 19:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (17 votes), past polls