Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re: Malformed UTF-8 character

by ikegami (Patriarch)
on Nov 30, 2022 at 14:08 UTC ( #11148459=note: print w/replies, xml ) Need Help??

in reply to Malformed UTF-8 character

That indicates a scalar which become corrupted when Perl or XS code improperly decoded a string.

For example, use utf8; doesn't validate if the source code is actually valid UTF-8, and produces corrupt scalars if it's not.

$ not_utf8="$( printf "\x96" )" $ perl -e"use utf8; q{$not_utf8}" Malformed UTF-8 character: \x96 (unexpected continuation byte 0x96, wi +th no preceding start byte) at -e line 1. Malformed UTF-8 character (fatal) at -e line 1.

(Fortunately, use utf8; catches the problem and bails.)

Are you using use utf8; with a source file that isn't encoded using UTF-8?

The likely culprit is a U+2013 EN DASH ("") encoded using cp1252.

Using the :utf8 encoding layer can also produce corrupt scalars.

$ printf "\x96" | perl -nle' use open ":std", ":utf8"; printf "%vX\n", $_; ' Malformed UTF-8 character: \x96 (unexpected continuation byte 0x96, wi +th no preceding start byte) in printf at -e line 1, <> line 1. 0

That's why :encoding(UTF-8) should be used instead.

Replies are listed 'Best First'.
Re^2: Malformed UTF-8 character
by BillKSmith (Monsignor) on Dec 03, 2022 at 13:52 UTC
    I wish that I had recognized that your "likely suspect" was the key to the whole mystery.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11148459]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (2)
As of 2023-02-01 03:21 GMT
Find Nodes?
    Voting Booth?

    No recent polls found