Do you know where your variables are?

Re^5: Character in 'b' format wrapped in unpack

by choroba (Bishop)
on Mar 29, 2015 at 22:23 UTC

in reply to Re^4: Character in 'b' format wrapped in unpack
in thread Character in 'b' format wrapped in unpack

When you do my $thing = chr( 12345 ); what does that "character" represent?

Is a Chinese character? Or Sanskrit? Or Cyrillic?

Is it utf-8; utf16; utf32?

Is it big-endian or little-endian?

It's Unicode. It's HANGZHOU NUMERAL TWENTY, in fact. UTF-8, UTF-16 both represent unicode codepoints, but encode them differently.

When you concatenate a different string to it, the result might depend on the version of Perl. See unicode_strings.

Re^6: Character in 'b' format wrapped in unpack
by BrowserUk (Pope) on Mar 29, 2015 at 23:04 UTC
    It's Unicode.

    Great! Then this must be unicode also:

    perl -MDevel::Peek -E"$x = chr(129).chr(130).chr(42).chr(131).chr(132) +; Dump($x); substr( $x, 2, 1 ) = chr(~0); Dump($x); print $x" | od -t +x1 SV = PV(0xbadc0) at 0x2c5aa8 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0xb67a8 "\201\202*\203\204"\0 CUR = 5 LEN = 8 SV = PVMG(0x2b1078) at 0x2c5aa8 REFCNT = 2 FLAGS = (SMG,POK,pPOK,UTF8) IV = 0 NV = 0 PV = 0x2b3008 "\302\201\302\202\377\200\217\277\277\277\277\277\277\ +277\277\277\277\302\203\302\204"\0 [UTF8 "\x{81}\x{82}\x{ffffffffffff +ffff}\x{83}\x{84}"] CUR = 21 LEN = 24 MAGIC = 0x3177f8 MG_VIRTUAL = &PL_vtbl_utf8 MG_TYPE = PERL_MAGIC_utf8(w) MG_LEN = -1 Wide character in print at -e line 1. 0000000 c2 81 c2 82 ff 80 8f bf bf bf bf bf bf bf bf bf 0000020 bf c2 83 c2 84 0000025


      Yes, that code is indeed garbage. Either 0xffffffffffffffff shouldn't have been added to the string, or the string shouldn't have been passed to print. Without an encoding layer, it expects the characters to be bytes (0..255).

