Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Character in 'b' format wrapped in unpack

by BrowserUk (Patriarch)
on Mar 28, 2015 at 15:14 UTC ( [id://1121667]=perlquestion: print w/replies, xml ) Need Help??

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

What does the error message mean?

$x = "\x55\xAA\x55\xAA";; substr( $x, $_, 1 ) = chr( ord( substr( $x, $_, 1 )) << 1 ) for 0 .. 3 +; print unpack 'b*', $x;; Character in 'b' format wrapped in unpack at (eval 34) line 1, <STDIN> + line 26. Character in 'b' format wrapped in unpack at (eval 34) line 1, <STDIN> + line 26. 01010101001010100101010100101010

Note: The error comes from the unpack line, not the cobbled together bit-string shift left.

Did the shifting manage to create a wide character?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Replies are listed 'Best First'.
Re: Character in 'b' format wrapped in unpack
by choroba (Cardinal) on Mar 28, 2015 at 15:21 UTC
    Did the shifting manage to create a wide character?
    It seems so. I added
    print $x:

    to the script and now I'm getting:

    Wide character in say at ./1.pl line 10.

    The output through xxd:

    0000000: c2aa c594 c2aa c594 0a30 3130 3130 3130 .........0101010 0000010: 3130 3031 3031 3031 3030 3130 3130 3130 1001010100101010 0000020: 3130 3031 3031 3031 300a 100101010.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Dratted Unicrap bites again!


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
        Use bytes.
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        The only involvement of Unicode is the use of UTF-8 when you provided invalid data to print. That usually means you forgot to encode the argument, so Perl does it for you (but warns you that it did.) Don't pass invalid data to print, and Unicode won't be involved.
Re: Character in 'b' format wrapped in unpack
by graff (Chancellor) on Mar 29, 2015 at 04:31 UTC
    Did the shifting manage to create a wide character?

    It's not the shifting per se that creates the wide character; when you pass a numeric value greater than 255 to chr, it must return a wide character.

    (I'm afraid I don't quite understand the reason(s) for what happens when the "use bytes" pragma is added -- if I've done it right, the only difference is to eliminate the warning message about the "wrapped character in unpack"; the resulting output is not changed. Is it the case that you got the particular pattern of zeros and ones you expected, and were just complaining about the warning message?)

      when you pass a numeric value greater than 255 to chr, it must return a wide character.

      There is no "must" about it. It should be the case that unless I specifically ask for Unicrap, characters should be assumed to be 8-bits.

      I'm afraid I don't quite understand the reason(s) for what happens when the "use bytes" pragma is added -- if I've done it right, the only difference is to eliminate the warning message about the "wrapped character in unpack"

      You're right. It does just enough to lull you into a false sense of security; then sneaks around behind and kicks you in the nuts!

      Rant:

      Is it the case that you got the particular pattern of zeros and ones you expected, and were just complaining about the warning message?)

      No. I wanted the shift to discard the high bit, as it does with integers:

      $n <<= 1; print unpack 'B*', pack 'N', $n;; 10101011010101001010101101010100 $n <<= 1; print unpack 'B*', pack 'N', $n;; 01010110101010010101011010101000 $n <<= 1; print unpack 'B*', pack 'N', $n;; 10101101010100101010110101010000

      Unfortunately, Unicrap (and Perl's implementation of Unicrap) conspire such that you can no longer rely upon simple byte semantics.

      The idea that a string (a good old array of bytes) can suddenly contain a random Unicrap character in a program that doesn't (and doesn't want to) use any Unicrap, is a farce!


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
        With all due respect for your justifiable anger, I'm sorry to disagree; "chr()" is - and rightly should be - intended to serve the (dominant) linguistic sense of "character" (what the perl docs call "character semantics"), rather than the strictly-typed, C-centric sense of "char" (what the perl docs call "octets" or "byte semantics").

        In other words, when you want to do low-level, C-like bit twiddling, just use pack and unpack - that's what those are for - and give up on pretending that higher-level, linguistically oriented functions (chr and ord) can do the same thing.

        I agree how sad it is that every user must pay the performance cost of unicode support, whether or not they actually need or use it. But then, it's also sad that every script must pay the overhead for untyped variables, no matter how much of that flexibility is actually needed or used.

        UPDATE: Having said that, I realize I'm probably still deficient in my understanding of your particular example. You said you "wanted the shift to discard the high bit, as it does with integers", and if I'm not mistaken (am I?), that's actually what happens, with or without the "use bytes" pragma (i.e. with or without the warning). Here's a simpler example - am I missing something?

        #use bytes; $x = "\xAA"; print unpack 'B*', $x; print " --> "; $x = chr( ord( $x ) << 1 ); print unpack 'B*', $x; print "\n---\n"; $x = pack( "C*", 0xAA ); print unpack 'B*', $x; print " --> "; $x = pack( "C", unpack( "C", $x ) << 1 ); print unpack 'B*', $x; print "\n---\n";
        When I run that, I get:
        Character in 'B' format wrapped in unpack at /tmp/j1.pl line 7. 10101010 --> 01010100 --- Character in 'C' format wrapped in pack at /tmp/j1.pl line 14. 10101010 --> 01010100 ---
        (Note the warning from using the "C" format on pack.) Looks to me like the high bit got shifted off in both cases - no difference. When I uncomment the "use bytes", the only difference I see is that the "B" format warning goes away (but the "C" format warning still shows up.) Is there a problem I'm not seeing?

        In case it matters, I'm using perl 5.18 on macosx 10.10.2 ("yoesemite").

Re: Character in 'b' format wrapped in unpack ( U0unicode mode C0character mode)
by Anonymous Monk on Mar 28, 2015 at 22:42 UTC

    Hmm???

    #!perl -lw $x = "\x55\xAA\x55\xAA";; substr( $x, $_, 1 ) = chr( ord( substr( $x, $_, 1 )) << 1 ) for 0 .. 3 +;; my $ox = $x; print unpack 'b*', $x;; print unpack 'U0b*', $x;; print unpack 'C0b*', $x;; __END__ Character in 'b' format wrapped in unpack at - line 5. Character in 'b' format wrapped in unpack at - line 5. 01010101001010100101010100101010 0100001101010101101000110010100101000011010101011010001100101001 Character in 'b' format wrapped in unpack at - line 7. Character in 'b' format wrapped in unpack at - line 7. 01010101001010100101010100101010

    Hmm??

    $ perl #!perl -lw $x = "\x55\xAA\x55\xAA";; substr( $x, $_, 1 ) = chr( ord( substr( $x, $_, 1 )) << 1 ) for 0 .. 3 +;; my $ox = $x; use bytes; print unpack 'b*', $x;; print unpack 'U0b*', $x;; print unpack 'C0b*', $x;; __END__ 0100001101010101101000110010100101000011010101011010001100101001 1100001101000001010000110101010111000011101000010100001100101001110000 +1101000001010000110101010111000011101000010100001100101001 0100001101010101101000110010100101000011010101011010001100101001
Re: Character in 'b' format wrapped in unpack
by ikegami (Patriarch) on Mar 29, 2015 at 23:06 UTC
    printf "%v02X\n", $x;
    gives
    AA.154.AA.154
    You created characters that are too large for bytes, and `b` expects the characters to be bytes. Maybe
    substr( $x, $_, 1 )) << 1
    should be
    ( substr( $x, $_, 1 )) << 1 ) & 0xFF

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1121667]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2024-04-18 02:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found