http://www.perlmonks.org?node_id=1186852

vr has asked for the wisdom of the Perl Monks concerning the following question:

A "binary" file for us:

C:\>perl -e "print qq(\xB5)" > data.bin

And:

use strict; use warnings; use feature 'say'; use Encode qw/ _utf8_off _utf8_on is_utf8 /; use utf8; use Devel::Peek; my $s1 = ' '; # a space (anything) _utf8_on( $s1 ); # or assign not-ascii above, instead my $s2 = $s1; open my $fh, '<', 'data.bin'; binmode $fh; sysread $fh, $s1, 1; Dump $s1; seek $fh, 0, 0; $s2 = do { local $/; <$fh> }; Dump $s2;
SV = PVMG(0xc149ec) at 0xc20dec REFCNT = 1 FLAGS = (PADMY,SMG,POK,pPOK,UTF8) IV = 0 NV = 0 PV = 0xc15a1c "\302\265"\0 [UTF8 "\x{b5}"] CUR = 2 LEN = 10 MAGIC = 0xc13ffc MG_VIRTUAL = &PL_vtbl_utf8 MG_TYPE = PERL_MAGIC_utf8(w) MG_LEN = -1 SV = PV(0x3f9f6c) at 0xc20f0c REFCNT = 1 FLAGS = (PADMY,POK,pPOK) PV = 0xc2e6a4 "\265"\0 CUR = 1 LEN = 10

Not sure if it's a bug or not.

Note that if the filehandle has been marked as :utf8 , Unicode characters are read instead of bytes (the LENGTH, OFFSET, and the return value of sysread are in Unicode characters)

Does this imply, that if FH has not been marked, OFFSET is treated as bytes? Then, possibly, utf8 becomes invalid?

I think that if OFFSET was 0, then string utf8-ness should match file's IO encoding layer. I.e., read produces same result as slurping, above. Regardless of content of original scalar. And, if OFFSET was not zero, then? It should be documented more clearly, perhaps. About combinations that should never be used.

BTW, it looks like it's about this bug. Tk passes file name as utf8, this parameter is (rather recklessly) re-used (!) to receive file content.