Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Malformed UTF-8 character

by ikegami (Patriarch)
on Nov 30, 2022 at 14:08 UTC ( [id://11148459]=note: print w/replies, xml ) Need Help??


in reply to Malformed UTF-8 character

That indicates a scalar which become corrupted when Perl or XS code improperly decoded a string.

For example, use utf8; doesn't validate if the source code is actually valid UTF-8, and produces corrupt scalars if it's not.

$ not_utf8="$( printf "\x96" )" $ perl -e"use utf8; q{$not_utf8}" Malformed UTF-8 character: \x96 (unexpected continuation byte 0x96, wi +th no preceding start byte) at -e line 1. Malformed UTF-8 character (fatal) at -e line 1.

(Fortunately, use utf8; catches the problem and bails.)

Are you using use utf8; with a source file that isn't encoded using UTF-8?

The likely culprit is a U+2013 EN DASH ("–") encoded using cp1252.


Using the :utf8 encoding layer can also produce corrupt scalars.

$ printf "\x96" | perl -nle' use open ":std", ":utf8"; printf "%vX\n", $_; ' Malformed UTF-8 character: \x96 (unexpected continuation byte 0x96, wi +th no preceding start byte) in printf at -e line 1, <> line 1. 0

That's why :encoding(UTF-8) should be used instead.

Replies are listed 'Best First'.
Re^2: Malformed UTF-8 character
by BillKSmith (Monsignor) on Dec 03, 2022 at 13:52 UTC
    I wish that I had recognized that your "likely suspect" was the key to the whole mystery.
    Bill

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11148459]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2025-07-18 08:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.