in reply to Re: using binmode() to override default encoding specified in "use open"
in thread using binmode() to override default encoding specified in "use open"

Thank you — working solutions highly appreciated!

I confess I don't understand this bit:

binmode adds to the defaults, rather than replacing them.

What does it even mean for a stream to have more than one encoding associated with it? The point of associating an encoding with a stream is to reduce ambiguity; a stream saying "eh, the encoding might be this or that" only increases ambiguity.

  • Comment on Re^2: using binmode() to override default encoding specified in "use open"

Replies are listed 'Best First'.
Re^3: using binmode() to override default encoding specified in "use open"
by hippo (Chancellor) on Jul 22, 2020 at 12:25 UTC

    I tend to agree with you that having multiple ":encoding" layers doesn't make a lot of sense. There may be a scenario where it does but I can't think of one right now.

    The docs for perliol include this gem:

    binmode() operates similarly to open(): by default the specified layers are pushed on top of the existing stack.

    So that agrees with what we see, and might well make sense for non-competing layers. It does rather appear that to override a specified default encoding with binmode you will need to do the reset first.

Re^3: using binmode() to override default encoding specified in "use open"
by jcb (Parson) on Jul 23, 2020 at 04:21 UTC

    The :encoding(...) that you pass to binmode is an example of a PerlIO layer, and the reason you can have more than one is that the layer system is supposed to be generic and usable for more than just encodings.

    In practice, associating multiple :encoding(...) layers to a read stream would mean that the data gets "decoded" more than once. This is almost certainly an error, but might be just what you need to fix some bizarre cases of mis-encoded data.

      Makes sense — thank you.

      The question is purely academic at this point, as I have my solution, but I'm curious what multiple encoding layers mean in an output stream. In my original example, if I replace \N{WHITE SMILING FACE} with the ISO 8859-1 character \N{REGISTERED SIGN}, the output file contains this character in 8859-1 encoding (the single byte \xAE). But if I then reverse the order of the encodings, the output file replaces this character with the (all-ASCII) string \xFFFD. \xFFFD seems completely unrelated to the REGISTERED SIGN character's encoding in either ISO 8859-1 or UTF-8.

      In other words, while I can see the use case you speak of for dealing with malformed input, I can't really see the use case for generating output unrelated to the content of the string. Perl does throw a warning, upon write, about being unable to properly handle the character, but it seems like it really ought to be warning at the moment a second encoding is put on the output stream, telling the user this is likely to generate garbage.

        FFFD is a substitute for a character that could not be properly encoded. It is very unlikely that multiple encodings will produce useful results, but they could be useful for deliberately generating malformed output to feed to some other program that expects that type of malformed input.

        I agree that pushing a second :encoding layer should produce a warning, though.