Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Malformed UTF-8 character

by Eliya (Vicar)
on Apr 29, 2011 at 20:16 UTC ( [id://902065]=note: print w/replies, xml ) Need Help??


in reply to Malformed UTF-8 character

Others have explained why Perl complains — in case the string literal is declared with double quotes, at least.  In other words, your source is apparently not encoded in UTF-8, as you're telling Perl with the pragma use utf8.

What I find more surprising is that Perl doesn't complain when - within the scope of use utf8 - the string literal (containing a Latin-1 encoded char like '°') is declared using single quotes.  I'd say the latter is a bug (unless I've overlooked something in the docs... :)

(I can replicate the issue here with 5.12.2.)

Replies are listed 'Best First'.
Re^2: Malformed UTF-8 character
by tchrist (Pilgrim) on Apr 30, 2011 at 16:31 UTC
    Eliya wrote:
    What I find more surprising is that Perl doesn’t complain when — within the scope of use utf8 — the string literal (containing a Latin‑1 encoded char like '°') is declared using single quotes. I’d say the latter is a bug (unless I've overlooked something in the docs... :)

    I can confirm it still occurs in 5.14 RC0:

    % blead -C0 -le 'print qq(print "\xB0C";)' | blead -Mutf8 -CS -l Malformed UTF-8 character (unexpected continuation byte 0xb0, with no +preceding start byte) at - line 1. C % blead -C0 -le 'print qq(print \x27\xB0C\x27;)' | blead -Mutf8 -CS -l #C
    Oops.
Re^2: Malformed UTF-8 character
by Steve_BZ (Chaplain) on Apr 30, 2011 at 13:23 UTC

    Thanks for this. Do I have the use statements right?

    use utf8; use Encode; binmode STDOUT, ":utf8"; use open ':encoding(utf8)';

    I'm not sure quite what they do or what the difference is. I understand that use utf8 is to save the code page in utf-8, that use Encode is to provide a utility to encode and decode, but I'm not sure what binmode STDOUT, ":utf8";or use open ':encoding(utf8)'; do?

    Regards

    Steve

      I understand that use utf8 is to save the code page in utf-8

      Not really sure what you mean by that... but use utf8 tells Perl that the source code (string literals, etc.) is encoded in UTF-8. So you shouldn't use it if that's not the case for your script.

      binmode STDOUT, ":utf8" sets the utf8 PerlIO layer for STDOUT, which tells Perl that you want UTF-8 encoded output for that file handle.

      use open ':encoding(utf8)' declares the default layer for I/O streams, i.e. you don't have to explicitly specify the respective layer when you open a file.  See the open pragma for the details.

        Thanks for this.

        Regards

        Steve

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://902065]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-04-20 04:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found