http://www.perlmonks.org?node_id=11148459


in reply to Malformed UTF-8 character

That indicates a scalar which become corrupted when Perl or XS code improperly decoded a string.

For example, use utf8; doesn't validate if the source code is actually valid UTF-8, and produces corrupt scalars if it's not.

$ not_utf8="$( printf "\x96" )" $ perl -e"use utf8; q{$not_utf8}" Malformed UTF-8 character: \x96 (unexpected continuation byte 0x96, wi +th no preceding start byte) at -e line 1. Malformed UTF-8 character (fatal) at -e line 1.

(Fortunately, use utf8; catches the problem and bails.)

Are you using use utf8; with a source file that isn't encoded using UTF-8?

The likely culprit is a U+2013 EN DASH ("–") encoded using cp1252.


Using the :utf8 encoding layer can also produce corrupt scalars.

$ printf "\x96" | perl -nle' use open ":std", ":utf8"; printf "%vX\n", $_; ' Malformed UTF-8 character: \x96 (unexpected continuation byte 0x96, wi +th no preceding start byte) in printf at -e line 1, <> line 1. 0

That's why :encoding(UTF-8) should be used instead.