Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^3: Malformed UTF-8 character

by pryrt (Abbot)
on Dec 01, 2022 at 19:44 UTC ( [id://11148486]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Malformed UTF-8 character
in thread Malformed UTF-8 character

The best solution probably it to download notepad++, but it seems like overkill to learn another editor to solve such a rare problem.

As much as it pains me to say it (given my Notepad++ fandom), it does seem like overkill. But iconv.exe comes with my Strawberry perl... and if it does with yours, then it can handle the translation. (Or gnuwin32's iconv). I believe one of the following two would properly translate the CP1252 encoding of the emdash into UTF-8.

iconv -f ISO-8859-1 -t utf-8 savedfile > outfile.pl iconv -f CP1252 -t utf-8 savedfile > outfile.pl

(Of course, the other fix is to not use utf8; after you download the script; perl will default to your native Windows encoding {if I understand things correctly}, so that should work -- at least, it did for me from that same downloaded source code.)

Replies are listed 'Best First'.
Re^4: Malformed UTF-8 character
by BillKSmith (Monsignor) on Dec 02, 2022 at 16:25 UTC
    Thanks for the reminder to check Strawberry (and perl) utilities occasionally.

    Note: The character in question (\x96) is one of the differences between ISO-8859-1 and CP1252. See the difference in the character starting at location 12. The savedfile is from the original post Regex: matching any Number then a hyphen.

    C:\Users\Bill\forums\monks>xxd savedfile 00000000: 3132 3334 202d 2046 6f6f 0d0a 3536 3737 1234 - Foo..5677 00000010: 3820 9620 4261 720d 0a39 3939 392e 2042 8 . Bar..9999. B 00000020: 617a 0d0a az.. C:\Users\Bill\forums\monks>iconv -f ISO-8859-1 -t utf-8 savedfile > ou +tfile.txt C:\Users\Bill\forums\monks>xxd outfile.txt 00000000: 3132 3334 202d 2046 6f6f 0d0a 3536 3737 1234 - Foo..5677 00000010: 3820 c296 2042 6172 0d0a 3939 3939 2e20 8 .. Bar..9999. 00000020: 4261 7a0d 0a Baz.. C:\Users\Bill\forums\monks>iconv -f CP1252 -t utf-8 savedfile > outfil +e.txt C:\Users\Bill\forums\monks>xxd outfile.txt 00000000: 3132 3334 202d 2046 6f6f 0d0a 3536 3737 1234 - Foo..5677 00000010: 3820 e280 9320 4261 720d 0a39 3939 392e 8 ... Bar..9999. 00000020: 2042 617a 0d0a Baz..

    Life was so much easier fifty years ago. Oh, there really were two keypunch codes.

    Bill
      > Life was so much easier fifty years ago

      If your native tongue didn't look like "šířící žďáření"...

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11148486]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2025-07-18 10:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.