raygun has asked for the wisdom of the Perl Monks concerning the following question:

I seek the wisdom of the Monks in selecting the encoding of an output stream in Perl v5.30.1.

The framework: all I/O should default to a specific encoding, but under certain conditions, one output stream (which may be stdout, a file, or a pipe) may need to have a different encoding.

The default encoding is simple to achieve: perldoc open documents pragmas for this.

Because the output stream under consideration may be STDOUT — which never has an explicit open() — it's more efficient and less repetitive to assign the output encoding via binmode() in one place (after the open() call), rather than including conditionals to either reassign STDOUT's encoding or explicitly specify an encoding at open() time.

The only minor flaw in this plan is that it doesn't work.

#!/usr/bin/perl -w use open qw/:std :encoding(iso-8859-1)/; # default I/O encoding my $s = "A \N{WHITE SMILING FACE} for you\n"; open (FILE, '> fpo'); # in the actual code, may op +en one of several things, or assign STDOUT to FILE binmode(FILE, ':encoding(utf8)') if 1; # override the default encod +ing under certain conditions warn "About to print"; # primitive trace statement print FILE "$s";

This generates the stderr:

About to print at ./fp line 11. "\x{263a}" does not map to iso-8859-1.
and, per the warning, creates an output file with the ASCII string \x{263a} in place of the UTF-8 character. The order of the stderr lines tells me the print call is what generated the mapping warning.

This snippet gives me the desired output if I remove the use open pragma, but then of course I lose the defaults needed for all other cases.

Any insights into the mysteries of I/O encoding are appreciated!

Replies are listed 'Best First'.
Re: using binmode() to override default encoding specified in "use open"
by hippo (Chancellor) on Jul 22, 2020 at 10:48 UTC

    I see the same as you in my local Perl (v5.20.3). Inspecting the layers shows that binmode adds to the defaults, rather than replacing them. You need to call if first with no layers in order to do a full reset. (Note: I've corrected the argument to encoding too)

    #!/usr/bin/perl -w use open qw/:std :encoding(iso-8859-1)/; # default I/O encoding my $s = "A \N{WHITE SMILING FACE} for you\n"; open (FILE, '> fpo'); # in the actual code, may op +en one of several things, or assign STDOUT to FILE my @layers = PerlIO::get_layers(FILE); print "Layers before binmode: @layers\n"; binmode(FILE, ':encoding(UTF-8)') if 1; # override the default enco +ding under certain conditions @layers = PerlIO::get_layers(FILE); print "Layers after binmode: @layers\n"; binmode(FILE) if 1; # reset to raw binmode(FILE, ':encoding(UTF-8)') if 1; # add our new encoding @layers = PerlIO::get_layers(FILE); print "Layers after reset: @layers\n"; warn "About to print"; # primitive trace statement print FILE "$s";

    And the output to terminal is

    Layers before binmode: unix perlio encoding(iso-8859-1) utf8 Layers after binmode: unix perlio encoding(iso-8859-1) utf8 encoding(u +tf-8-strict) utf8 Layers after reset: unix perlio encoding(utf-8-strict) utf8 About to print at /tmp/11119633.pl line 20.

    There may be a neater way but this is at least a working solution, AFAICT.

      binmode(FILE) if 1; # reset to raw binmode(FILE, ':encoding(UTF-8)') if 1; # add our new encoding
      This can be shortened to
      binmode(FILE, ':raw:encoding(UTF-8)') if 1;
      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      Thank you — working solutions highly appreciated!

      I confess I don't understand this bit:

      binmode adds to the defaults, rather than replacing them.

      What does it even mean for a stream to have more than one encoding associated with it? The point of associating an encoding with a stream is to reduce ambiguity; a stream saying "eh, the encoding might be this or that" only increases ambiguity.

        I tend to agree with you that having multiple ":encoding" layers doesn't make a lot of sense. There may be a scenario where it does but I can't think of one right now.

        The docs for perliol include this gem:

        binmode() operates similarly to open(): by default the specified layers are pushed on top of the existing stack.

        So that agrees with what we see, and might well make sense for non-competing layers. It does rather appear that to override a specified default encoding with binmode you will need to do the reset first.

        The :encoding(...) that you pass to binmode is an example of a PerlIO layer, and the reason you can have more than one is that the layer system is supposed to be generic and usable for more than just encodings.

        In practice, associating multiple :encoding(...) layers to a read stream would mean that the data gets "decoded" more than once. This is almost certainly an error, but might be just what you need to fix some bizarre cases of mis-encoded data.

Re: using binmode() to override default encoding specified in "use open"
by ikegami (Pope) on Jul 23, 2020 at 04:27 UTC

    Just use

    open(FILE, '>:encoding(UTF-8)', 'fpo');

    Notes:

    • Don't needlessly use global vars. Use open(my $FILE, ...) instead of open(FILE, ...).
    • Please avoid the two-arg form of open.
    • open is very prone to failing. Should add some error checking.
    • "utf8" is an extension to UTF-8 used by Perl internally. You want "UTF-8", not "utf8". (Case doesn't matter.)
    my $qfn = 'fpo'; open(my $FILE, '>:encoding(UTF-8)', $qfn) or die("Can't create \"$qfn\": $!\n");

      Thank you for the response. The fourth paragraph of my initial post explains why I'm using binmode rather than this solution.

      Your additional notes are good advice, but I omitted error checking, etc., from my example because I simplified my code to include only the relevant bits. I apologize for not stating this explicitly; I wrongly presumed it was clear from constructions like if 1 that are pointless in production code. (In practice I use autodie to avoid having to individually check every open with identical logic or to write my own open wrapper.)

Re: using binmode() to override default encoding specified in "use open"
by Anonymous Monk on Jul 22, 2020 at 10:45 UTC
    Try UTF-8

      Hm. :encoding() not being a function, and perldoc encoding talking about something unrelated, it wasn't immediately clear where to find :encoding()'s documentation. I ended up choosing utf8 over UTF-8 because perldoc Encode::Supported, which I found when casting about for encoding info, indicated the former form was canonical. But now I suspect I should have been looking at perldoc Encode instead.

      But I'm still not completely sure. Perl's encoding documentation is a twisty maze of little passages, all different, and I've not yet found a signpost definitively stating "These are the legal arguments to :encoding() and what they mean."

        That is incorrect. "utf8" is an extension to UTF-8 used by Perl internally. "UTF-8" is the standard encoding. (The names are case-insensitive.)

        I have no idea why that pages says they are equivalent. They are not. See :encoding(UTF-8) vs :encoding(utf8) vs :utf8.