http://www.perlmonks.org?node_id=255758


in reply to how to turn off warning: wide character in print ?

I raised what I'm guessing are some related problems in Setting UTF-8 mode on filehandle reads? a few months back. The best responses are from Re: Setting UTF-8 mode on filehandle reads? and Re: Setting UTF-8 mode on filehandle reads? from diotalevi and grantm respectively. (May their tribes increase.)

I have a version of 5.8 now (hurray for getting off Windows!), and I wrote a little test script in 5.8 to see how one can get away with pushing encoding methods:

#!/usr/bin/perl -w use utf8; # without the encoding layer open (my $fh1, ">", "test-normal"); # with the encoding layer open (my $fh2, ">:utf8", "test-utf8"); # ASCII data # encodes the same in UTF-8 or Latin-1 encodings my $ascdata = "aei\n"; print $fh1 $ascdata; print $fh2 $ascdata; # accented a e i my $l1data = "\xe1\xe9\xed\n"; # these characters *can* be encoded in Latin-1 or in UTF-8 # (though differently for each) print $fh1 $l1data; print $fh2 $l1data; # U+0641 ARABIC LETTER FEH my $u8data = "\x{0641}\n"; # "Arabic-Feh" can't be encoded in Latin-1, can be encoded in UTF-8 print $fh1 $u8data; # <--THIS LINE GENERATES WARNING print $fh2 $u8data;
Here's the results, when checked with od:
[jeremy@serpent pm-test]$ perl wide-char.pl Wide character in print at wide-char.pl line 27. [jeremy@serpent pm-test]$ od -t x1 test-normal 0000000 61 65 69 0a e1 e9 ed 0a d9 81 0a 0000013 [jeremy@serpent pm-test]$ od -t x1 test-utf8 0000000 61 65 69 0a c3 a1 c3 a9 c3 ad 0a d9 81 0a 0000016 [jeremy@serpent pm-test]$

I recognize this isn't exactly what you were asking, but I suspect that the utf8 pragma and the :utf8 encoding layer are getting mixed up somewhere in your code -- one or another is missing, etc.

More specifically, it sounds like you're trying to print a character with a chr value larger than 0xff on a Latin-1 filehandle. Those characters, aren't encodable like that, so you're running the risk of losing data. This is a problem and I encourage you to track it down. Turning off a warning isn't the same as fixing the cause of one.

It might help if you would post a snippet that exhibits the warning. Warnings are usually there for a reason, and perhaps there's something in your code that is a bit sketchy from the compiler's point-of-view.

Hope that helps.

Update: I've just noticed that Perl's coping behavior for characters greater than 0xff on a non-utf8 output filehandle is to print the utf-8 encoding of that character anyway: note that the last two bytes before the newline in both examples are d9 81.

No wonder you get a warning. There's no systematic way to recover whether the output data was originally UTF-8 or not!

Update 2: Cleaned up comments by using the word "encoded" instead of "printed".