Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Can unpack add zero bytes before converting?

by mossi2000 (Initiate)
on Sep 12, 2021 at 13:29 UTC ( #11136689=perlquestion: print w/replies, xml ) Need Help??

mossi2000 has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I'm using Perl to parse the byte stream output of a hardware.
Depending on the HW configuration I get a stream of data consisting
of 40, 48, 56 or 64 bits litle-endian.
In principle the lower 5,6,7 or the complete 8 byte of a 64bit litle-endian integer.
I was trying to convert this data using unpack but whatever I tried using 'x' or '@'
I did not succeed in (p)adding the missing 0 bytes before converting to a 64bit integer.
(I'm using a Perl with support for 64bit integers)

My current solution looks like: (using bitstrings..)
my $bytes_per_value = 5; # to simulate the byte stream using 40bit = 5 * 8bit; my $value = 0xf_dead_beef_4; my $bin_value = substr (pack ('Q', $value), 0, $bytes_per_value); my $buffer = $bin_value x 4 my $nbytes = length ($buffer); my $fmt = sprintf "(b%d)*", $bytes_per_value << 3; my @stream_data = unpack ($fmt, substr ($buffer, 0, $nbytes)); my @values = map { oct '0b'.reverse ($_)} @stream_data; foreach my $v (@values) { printf "0x%x\n", $v; }

My question:
Is there a way to unpack this stream directly into an array of QWords (64bit)
using some form of unpack for 5,6 and 7 byte data. (The 8 byte case is obviously easy :-) )

Means: Can I specify via the format string to convert 5,6 or 7 bytes + 3,2 or 1 padding Zero bytes
to a 64bit integer?
Or can unpack only work on "existing" data bytes.

Thanks for any hints!


Replies are listed 'Best First'.
Re: Can unpack add zero bytes before converting?
by vr (Curate) on Sep 13, 2021 at 11:19 UTC

    You can borrow missing (to count of 8) bytes from adjacent value and then throw them away with post-processing, applying a mask. Looks like it's faster to stay "numeric" at all stages if possible. The "unpack-pad-pack-unpack", (tybalt89's solution) is somewhere in the middle in terms of performance. Not that you said speed is the goal, but anyway:

    use strict; use warnings; no warnings 'portable'; use Benchmark 'cmpthese'; my $bytes_per_value = 5; my $count = 1_000; my $value = 0xf_dead_beef_4; my $bin_value = substr (pack ('Q', $value), 0, $bytes_per_value); my $buffer = $bin_value x $count; my $fmt = sprintf "(b%d)*", $bytes_per_value << 3; my $pad_len = 8 - $bytes_per_value; my $mask = ~ 0 >> $pad_len * 8; cmpthese -1, { strings => sub { my @values = map { oct '0b'.reverse ($_)} unpack ($fmt, $buffer); return \@values; }, numbers => sub { my @values = unpack "(QX$pad_len)$count", $buffer . "\0" x $pad_len; $_ &= $mask for @values; return \@values; }, }; __END__ Rate strings numbers strings 2386/s -- -77% numbers 10274/s 331% --

      I had to explicitly include a count in template: "(QX$pad_len)$count", instead of simply "(QX$pad_len)*", because otherwise @values array would result in a very puzzling length of 2501 items instead of 1000 i.e. $count.

      I suspect there's special code somewhere to prevent ((un?)documented(?) case of) endless loops with '(CX)*' or '(vX2)*' or similar, but still:

      say for unpack '(VX2)*', "\1\1\1\1"; # why 2 items? say for unpack '(VX3)*', "\1\1\1\1"; 'X' outside of string in unpack # how's that?

      and why 2501 items with '(QX3)*' template to begin with?

Re: Can unpack add zero bytes before converting?
by Anonymous Monk on Sep 12, 2021 at 16:45 UTC

    I have not been able to come up with a way to do it in a single unpack, but if I post my solution maybe someone will come up with something better:

    # Arguments are datum length (5, 6, 7, or 8) and stream data (whose # length must be a multiple of the datum length) sub unpack_stream { my ( $length, $stream ) = @_; $length ||= 5; $length >= 5 and $length <= 8 or die "Length $length must be between 5 and 8 inclusive\n"; my $pad = pack 'C*', ( 0 ) x ( 8 - $length ); return map { unpack 'Q<', "$_$pad" } unpack "(a$length)*", $stream +; }

      Maybe unpack can't but pack can :)


      return unpack '(Q<)*', pack '(a8)*', unpack "(a$length)*", $stream;

      And you no longer need $pad or the map.

      semi-tested :)

        See? I knew someone would come up with something cleaner. Thanks.

        I understand the OP to have requested that it be done in a single unpack, but I'm convinced that is not possible -- a challenge for someone to come along and prove me wrong. And that is how knowledge grows.

Re: Can unpack add zero bytes before converting?
by AnomalousMonk (Bishop) on Sep 13, 2021 at 02:10 UTC

    I don't clearly understand your problem, but it reminds me of the recent question Split any number into string of 8-bit hex values (=1 byte). Perhaps it may give you some insight concerning a solution.

    One important lesson to take away from that post is the importance of clearly stating a problem. :)

    Give a man a fish:  <%-{-{-{-<

Re: Can unpack add zero bytes before converting?
by mossi2000 (Initiate) on Sep 13, 2021 at 13:19 UTC
    Hi to all who have answered!

    Thanks for all your input.
    Of course my question was driven by "need for speed".
    I tried to minimize the number of loops to go over the buffer (2k, 4k, 8k, or 16k of values)
    I extended the benchmark script and put the different solutions all in there.
    Below you can see the results.

    Three things:
    -I was astonished that the unpack_pack_unpack way is still twice as fast as my solution using bitstrings.
    -the solution with the borrowed pad to 8 bytes is really nifty! With one single buffer padding,
    one can nearly achieve what I wanted first. One unpack only.
    -Next surprise: the used
     $_ &= $mask for @values;
    does the same as
    map { $_ &= $mask) @values;
    but is about 25% faster....Must be the useless generation of the resulting array...
    (the test with @values = map { $_ &= $mask) @values; gets a rate of only 4000/s!!)

    So of course I'll go with the fastest solution, which has the advantage that it includes the
    easy case of 8 bytes per value at no cost when I pre-compute the unpack format.

    Thanks again,

    se strict; use warnings; no warnings 'portable'; use Benchmark 'cmpthese'; my $bytes_per_value = 5; my $count = 1_000; my $value = 0xf_dead_beef_4; my $bin_value = substr (pack ('Q', $value), 0, $bytes_per_value); my $buffer = $bin_value x $count; my $fmt = sprintf "(b%d)*", $bytes_per_value << 3; my $pad_len = 8 - $bytes_per_value; my $mask = ~ 0 >> $pad_len * 8; my $pad = "\0" x $pad_len; my @fmts = ('(Q<)', '(Q<X)', '(Q<X2)', '(Q<X3)'); my $fmt2 = $fmts[$pad_len] . sprintf ("%d", $count); cmpthese -1, { strings => sub { my @values = map { oct '0b'.reverse ($_)} unpack ($fmt, $buffe +r); return \@values; }, map_unpack => sub { my @values = map { unpack 'Q<', "$_$pad" } unpack "(a$bytes_p +er_value)*", $buffer; return \@values; }, unpack_pack_unpack => sub { my @values = unpack '(Q<)*', pack '(a8)*', unpack "(a$bytes_pe +r_value)*", $buffer; }, numbers_map => sub { my @values = unpack "(QX$pad_len)$count", $buffer.$pad; map { $_ &= $mask } @values; return \@values; }, numbers_for => sub { my @values = unpack $fmt2, $buffer.$pad; if ($pad_len) { $_ &= $mask for @values; } return \@values; }, }; __END__
    Rate strings map_unpack unpack_pack_unpack number +s_map numbers_for strings 1707/s -- -28% -47% + -69% -77% map_unpack 2386/s 40% -- -26% + -56% -68% unpack_pack_unpack 3242/s 90% 36% -- + -41% -57% numbers_map 5462/s 220% 129% 68% + -- -28% numbers_for 7538/s 342% 216% 133% + 38% --

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11136689]
Front-paged by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (4)
As of 2022-10-04 07:22 GMT
Find Nodes?
    Voting Booth?
    My preferred way to holiday/vacation is:

    Results (16 votes). Check out past polls.