Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^5: Malformed UTF-8 character

by kcott (Archbishop)
on Dec 03, 2022 at 04:45 UTC ( [id://11148517]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Malformed UTF-8 character
in thread Malformed UTF-8 character

G'day Bill,

"I did not know what unicode character the \x96 was meant to represent."

A quick way to determine this is via "Unicode Character Code Charts" — it has "Find chart by hex code:" near the top of the page.

[Aside: Although that's a standard URL, I noted, when checking it, that it has: "Unicode 15.0 Character Code Charts". I thought that I'd just mention that Perl does a pretty good job of supporting the latest Unicode versions. Perl v5.36.0 (released in May this year) supports Unicode 14.0 (the current version at the time); if you're desperate for 15.0 support, it was added in v5.37.5 (or just wait for 5.38.0 to be released in May next year, or thereabouts).]

That will give you the name, <control>, and the informative alias, START OF GUARDED AREA; you can use the latter in \N{}.

$ perl -E 'say sprintf "%x", ord("\N{START OF GUARDED AREA}")' 96

In a script or one-liner, you can use Unicode::UCD, but it's not always straightforward. Compare:

$ perl -MUnicode::UCD=charinfo -E 'say charinfo(0x34)->{name}' DIGIT FOUR $ perl -MUnicode::UCD=charinfo -E 'say charinfo(0x34)->{unicode10} || +"<blank>"' <blank> $ perl -MUnicode::UCD=charinfo -E 'say charinfo(0x96)->{name}' <control> $ perl -MUnicode::UCD=charinfo -E 'say charinfo(0x96)->{unicode10} || +"<blank>"' START OF GUARDED AREA

— Ken

Replies are listed 'Best First'.
Re^6: Malformed UTF-8 character
by BillKSmith (Monsignor) on Dec 03, 2022 at 13:39 UTC
    My problem was that the \x96 was not the Unicode code-point, or even the utf8 encoding of the character. I now know that it is the cp1252 encoding of \N{EN DASH}. I had forgotten that there is such a thing as cp1252!
    Bill

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11148517]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (3)
As of 2025-07-08 11:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.