http://www.perlmonks.org?node_id=1010601

ribasushi has asked for the wisdom of the Perl Monks concerning the following question:

UPDATE (solved fsvo)

It turns out this a known bug, which surprisingly has not been fixed yet. In the meantime "do not do this if it hurts" seems to be the best course of action :(

/UPDATE

Greetings venerable monks,

While I think (humbly) I have a rather good grasp of how unicode is handled by perl in and out, I find myself stumped by the following example:

perl -e ' my $str = { map { $_ => "\x{A9}" } qw(byte char) }; utf8::upgrade($str->{char}); for (keys %$str) { open (my $fh, "<", \do{$str->{$_}}); printf( "$_ is read as %s\n", unpack "H*", <$fh>); } printf "Strings are: %s\n", ($str->{byte} eq $str->{char} ? "equal" : "different") ; '

I understand why "char" and "byte" are considered equal. What I do not understand is why the internal storage details "leak" through the in-memory filehandle.

Explanations welcome!

  • Comment on [Already reported Perl Bug] Confusion over utf8 and in-memory filehandles
  • Download Code

Replies are listed 'Best First'.
Re: Confusion over utf8 and in-memory filehandles
by Anonymous Monk on Dec 28, 2012 at 00:56 UTC

    What do you mean, how do they leak through?

    byte is read as a9 char is read as c2a9 Strings are: equal

    \xa0 is U+00A9 is ord 169, c2a9 is ord 169 utf8-encoded, after you decode it, it is chr 169

      It is the same string with the utf8 flag flipped up. As such I would expect the same bytes to be available when reading it as a "filehandle". Yet the utf8-ness (which is claimed to be an internal impl. detail all ovetr the docs) is "visible" in this case.

      I am not sure I understand why (nor can find any relevant perldoc)