Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Bit order in bytes

by geoffleach (Beadle)
on Dec 10, 2013 at 05:21 UTC ( #1066377=perlquestion: print w/ replies, xml ) Need Help??
geoffleach has asked for the wisdom of the Perl Monks concerning the following question:

I am (for my sins) the maintainer of Audio::TagLib. There, I said it. My question arises out of constructing code that creates a 32-bit header for MPEG files. The header consists of bit fields of various widths.

This question concerns the ordering of bits in a 8-bit byte, packed into an integer. If I number the bits 7..0, the assignment results in a value when unpacked with 'b*' gives me the expected result: 11110010, with bit #7 on the left. The TagLib C++ library with which I communicate expects the field to look like this: 01001111, which is what one gets when unpacking with B*.

A pointer to doc that explains what's behind b vs B decoding would be greatly appreciated.

The actual code is attached.

Thanks.

#!/usr/bin/perl # Reference taglib-1.9.1 doc TagLib::MPEG::Header # and http://www.mp3-tech.org/programmer/frame_header.html package MPEG_Header; my %header = ( 'FrameSync' => [31,21], # Frame sync (all bits m +ust be set) 'VersionID' => [20,19], # MPEG Audio version ID 'Layer' => [18,17], # Layer description 'Protection' => [16,16], # Protection bit 'BitRate' => [15,12], # Bitrate index 'SamplingRate' => [11,10], # Sampling rate frequenc +y index 'Padding' => [9,9], # Padding bit 'Private' => [8,8], # Private bit. This one +is only informative. 'ChannelMode' => [7,6], # Channel Mode 'ModeExtension' => [5,4], # Mode extension (Only u +sed in Joint stereo) 'Copyright' => [3,3], # Copyright 'Original' => [2,2], # Original 'Emphasis' => [1,0], # Emphasis ); my %VersionID = ('2.5' => 0b00, '2' => 0b10, '1' => 0b11, ); my %Layer = ('III' => 0b01, 'II' => 0b10, 'I' => 0b11, ); my %Padding = ('Pad' => 0b1, 'NoPad' => 0b0, ); my %Protection = ('Protected' => 0b0, 'NotProtected' => 0b1, ); my %ChannelMode = ('Stereo' => 0b00, 'JointStereo' => 0b01, 'DualChannel' => 0b10, 'SingleChannel' => 0b11, ); my %Copyright = ('No', => 0b0, 'Yes' => 0b1, ); my %Original = ('Copy' => 0b0, 'Original' => 0b1, ); my %Emphasis = ('None' => 0b00, '5015' => 0b01, 'CCIT' => 0b11, ); sub _set_header_field { my ($hdr, $field, $value) = @_; warn "Field $field is not defined in the MPEG header\n" unless $he +ader{$field}; my $first = $header{$field}->[0]; my $last = $header{$field}->[1]; $hdr = pack("B*", "0"x32) unless defined $hdr; my ($pos, $off); if ( defined $value ) { # Field value assignment # Possible symbolic reference no strict 'refs'; # For symbolic reference $value = $$field{$value} if exists $$field{$value}; my $width = $first - $last + 1; for ($pos = $first; $width; $width --, $pos --) { $off = 31 - $pos; vec($hdr, $off, 1) = vec($value, $width - 1, 1); } return $hdr; } # Fill field with 1s for ($pos = $first; $pos >= $last; $pos --) { $off = 31 - $pos; vec($hdr, $off, 1) = 1; } return $hdr; } sub build_header { my %args = @_; my $header; $header = _set_header_field( $header, 'FrameSync' ); # All headers + have this foreach my $key ( keys %args ) { $header = _set_header_field( $header, $key, $args{$key} ); } return $header; } 1; =if 0 #! /usr/bin/perl use lib '.'; use MPEG_Header; my $mpeg_header = MPEG_Header::build_header( 'VersionID' => 2, 'Layer' => 'III', 'Protection' => 'NotPro +tected', 'BitRate' => 0b1000, + # 64 kbps for V2 and L3 'SamplingRate' => 0b10, # + 16000 Hz for V2 'Padding' => 'Pad', 'ChannelMode' => 'Stereo +', 'Copyright' => 'No', 'Original' => 'Yes', 'Emphasis' => 'CCIT', ); printf "b %-s\n", unpack("b*", $mpeg_header); printf "B %-s\n", unpack("B*", $mpeg_header); Spaces added. Unpack with b vs B. b 11111111 11110010 10001000 11000111 00001100 B 11111111 01001111 00010001 11100011 00110000 =cut

Comment on Bit order in bytes
Download Code
Re: Bit order in bytes
by BrowserUk (Pope) on Dec 10, 2013 at 06:07 UTC
    A pointer to doc that explains what's behind b vs B decoding would be greatly appreciated.

    In terms of single bytes, the difference between 'b' & 'B' templates is purely a matter of cosmetics; that is, the same information is being presented differently:

    [0] Perl> print unpack 'B8', chr(65);; 01000001 [0] Perl> print unpack 'b8', chr(65);; 10000010

    Nothing changed in the hardware or the internal representation of 'A', just order in which the bits are presented to the user.

    1. 'b' produces lsb -> msb; left to right.
    2. 'B' produces msb -> lsb; left to right.

    Where the real difference comes in is when dealing with values greater than one byte:

    ## spacing and annotation added manually... [0] Perl> print unpack 'b*', "\x12\x34";; 0100 1000 0010 1100 2 1 4 3 [0] Perl> print unpack 'B*', "\x12\x34";; 0001 0010 0011 0100 1 2 3 4

    As you can see, not only is the bit-order different, but so is the apparent ordering of the nybbles within the bytes. In part this is due to my using a little-endian hardware. If you are using or have access to a big-endian machine, you'd see different results above, but they would still both be just different ways of viewing the same information.

    Again nothing changed in the storage of the values within the memory, the apparent reordering is purely down to the way the bits are displayed.

    So, the difference is just an illusion created by the the way you are viewing the bits, and probably not what you should be concentrating on.

    Provided you are calculating the correct bit positions for your use of vec in _set_header_field() -- which is a matter of whether you've done your homework correctly -- how you chose to view those bits (ie.with 'b' or 'B') is really down to which makes more sense for you.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      If you are using or have access to a big-endian machine, you'd see different results above

      Actually, it's the same results - these templates apparently know whether they're on a big-endian or little-endian machine, and adjust themselves accordingly to standardise the output.

      Not what I was expecting ... I have, however, just tested this.

      Cheers,
      Rob
        Actually, it's the same results - these templates apparently know whether they're on a big-endian or little-endian machine, and adjust themselves accordingly to standardise the output.

        Hm. That is a surprise.

        Now I am really confused by the apparent nybble swapping:

        print unpack 'b*', "\x12\x34";; 0100 1000 0010 1100 2 1 4 3 print unpack 'B*', "\x12\x34";; 0001 0010 0011 0100 1 2 3 4 print unpack 'b*', pack 'v', 0x1234;; 0010 1100 0100 1000 4 3 2 1 print unpack 'b*', pack 'n', 0x1234;; 0100 1000 0010 1100 2 1 4 3

        I can't make sense of that at all.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        It's when you're working with floating point that you'd have to deal with the headache you're anticipating. Int formats are considerably more agreeable to deal with across different endian machines.
      I'll buy the illusion. Is it the case, then that 'B' decoding is looking at the bytes and presenting them in a big-endian convention? (We're small-endian here.)

      Hopefully this won't confuse things further. Here's the sequence of events in the application.

      o Perl code packs bits into int, numbered 31 to 0.

      o Int is passed to xs code that passes it on to C++ lib

      o C++ coerces the int to unsigned long

      o The long is converted to std::bitset

      o Indexing the bitset, the byte order is correct,

      o The bits are reversed in each byte

        Maybe this will clarify things a little.

        This constructs a union between a unsigned 32-bit integer and a struct containing 32 x 1-bit fields.

        It assigned the value 0x01234567 to the uint and then prints (from C) a string of 0s & 1s to reflect the bitfields, first to last in the struct.

        You'll notice from the output (after __END__), that the first bitfield in the struct maps to the lsb in the integer; and the last to the msb; indicating that this is a little-endian (intel) machine.

        It then passes the uint back to perl, packs it using the little-endian template 'V' and unpacks its bits using both 'b' and 'B'.

        Note that the 'b' template mirrors the ordering of the bits as seen via the bitfields.

        However, once you go beyond that into the realms of C++ coercions and bitsets, I'm afraid your on your own.

        #! perl -slw use strict; use Inline C => Config => BUILD_NOISY => 1,; use Inline C => <<'END_C', NAME => 'bitfields', CLEAN_AFTER_BUILD =>0 +; #include "mytypes.h" union { struct { unsigned b00:1; unsigned b01:1; unsigned b02:1; unsigned b03:1 +; unsigned b04:1; unsigned b05:1; unsigned b06:1; unsigned b07:1 +; unsigned b08:1; unsigned b09:1; unsigned b10:1; unsigned b11:1 +; unsigned b12:1; unsigned b13:1; unsigned b14:1; unsigned b15:1 +; unsigned b16:1; unsigned b17:1; unsigned b18:1; unsigned b19:1 +; unsigned b20:1; unsigned b21:1; unsigned b22:1; unsigned b23:1 +; unsigned b24:1; unsigned b25:1; unsigned b26:1; unsigned b27:1 +; unsigned b28:1; unsigned b29:1; unsigned b30:1; unsigned b31:1 +; } bits; U32 uint; } X; U32 test( SV *unused ) { X.uint = 0x01234567; printf( "%u\n", X.uint ); printf( "%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c% +c%c%c%c\n", X.bits.b00 ? '1' : '0', X.bits.b01 ? '1' : '0', X.bits.b02 ? '1' + : '0', X.bits.b03 ? '1' : '0', X.bits.b04 ? '1' : '0', X.bits.b05 ? '1' + : '0', X.bits.b06 ? '1' : '0', X.bits.b07 ? '1' : '0', X.bits.b08 ? '1' + : '0', X.bits.b09 ? '1' : '0', X.bits.b10 ? '1' : '0', X.bits.b11 ? '1' + : '0', X.bits.b12 ? '1' : '0', X.bits.b13 ? '1' : '0', X.bits.b14 ? '1' + : '0', X.bits.b15 ? '1' : '0', X.bits.b16 ? '1' : '0', X.bits.b17 ? '1' + : '0', X.bits.b18 ? '1' : '0', X.bits.b19 ? '1' : '0', X.bits.b20 ? '1' + : '0', X.bits.b21 ? '1' : '0', X.bits.b22 ? '1' : '0', X.bits.b23 ? '1' + : '0', X.bits.b24 ? '1' : '0', X.bits.b25 ? '1' : '0', X.bits.b26 ? '1' + : '0', X.bits.b27 ? '1' : '0', X.bits.b28 ? '1' : '0', X.bits.b29 ? '1' + : '0', X.bits.b30 ? '1' : '0', X.bits.b31 ? '1' : '0' ); return X.uint; } END_C my $uint = test( 1 ); print unpack 'b*', pack 'V', $uint; print unpack 'B*', pack 'V', $uint; __END__ C:\test>bitFields.pl 19088743 11100110101000101100010010000000 11100110101000101100010010000000 01100111010001010010001100000001

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        :
Re: Bit order in bytes
by Anonymous Monk on Dec 10, 2013 at 15:32 UTC
    It has a lot to do with human nomenclature ... do they number the bits from left-to-right (0=MSB), or from right-to-left? Then, entirely separately from this, do they store a multi-byte number high-byte-first or low-byte-first? Finally, how does your programming-tool of choice do it? There are no "right" answers. When you are doing bit-twiddling, you have to think of all these things, and it is highly machine-specific. Documentation and project designs need to explicitly spell-out all of these things, right there on the front page, and it should also be summarized in documentation within the source code.
      With respect to doc, would it surprise you to hear that there's no mention? No? Me neither.

      That said, someone is messing with my int, reversing the order of the bits per-byte. What I conclude from your reply is that I'll just have to live with it. Sigh! And thanks.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1066377]
Approved by shmem
Front-paged by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2014-07-26 02:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (175 votes), past polls