Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^2: My UTF-8 text isn't surviving I/O as expected

by ibm1620 (Hermit)
on Nov 26, 2024 at 01:56 UTC ( [id://11162889]=note: print w/replies, xml ) Need Help??


in reply to Re: My UTF-8 text isn't surviving I/O as expected
in thread My UTF-8 text isn't surviving I/O as expected

Reading Tom Christiansen's sobering post about Unicode was enough to discourage me from trying to become proficient with Unicode. I'm retired so I get to do that :-)
  • Comment on Re^2: My UTF-8 text isn't surviving I/O as expected

Replies are listed 'Best First'.
Re^3: My UTF-8 text isn't surviving I/O as expected
by cavac (Prior) on Nov 26, 2024 at 07:59 UTC

    On the surface, yes, it looks bad. But from my experience, you can cover nearly all cases (like 99.5% or so) by following some simple rules, no matter the encoding:

    • Convert all incoming data to perls internal representation (utf8_decode or similar)
    • Convert all outgoing data to the correct encoding (utf8 or similar)
    • Unless you really have to verify very specific things in text, just treat it like a random binary blob.
    • 0 + $var works for converting text to numeric values.
    • If you do any type of string comparison in your code, always normalize both sides using Unicode::Normalize and always stick to the same normalization form.
    • Don't assume that any other text encoding standard is saner. Or even a global standard.

    The basic ugliness of Unicode (or other text encodings) stems not from their engineers but from the basic fact that human language is a complicated mess. And written language is still a somewhat new concept in human evolution and we are still trying to figure out the finer details. At least with Unicode, you don't have to constantly switch schemes depending on who is using your software.

    PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
    Also check out my sisters artwork and my weekly webcomics

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11162889]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2025-07-11 09:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.