Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^12: Converting Unicode

by jeffenstein (Hermit)
on Dec 06, 2023 at 09:28 UTC ( [id://11156131]=note: print w/replies, xml ) Need Help??


in reply to Re^11: Converting Unicode
in thread Converting Unicode

Perl assumes everything (except newlines) is bytes unless you tell it otherwise. Python 2 did the same. Python 3 assumes (almost) everything is utf-8 unless you tell it otherwise. Of the three, Python 3 is arguably the most broken. I can attest to this having to occasionally work with python and ISO-8859-15 files.

Tom Christiansen's answer on Stack Overflow seems to be the definitive answer to why perl doesn't do it this way. perldoc perluniintro, perlunitut, perlunifaq, and perlunicode should give you most of what you want to know about unicode in perl.

Replies are listed 'Best First'.
Re^13: Converting Unicode
by Polyglot (Chaplain) on Dec 06, 2023 at 10:29 UTC
    Are you implying that if one is using nothing other than UTF8 (i.e. no need for ISO-8859-15), Python 3 might actually handle just fine?

    If the "brokeness" of Python is because it cannot handle non-UTF8 properly, that would not impact me at all, as everything I'm doing is with UTF8.

    Blessings,

    ~Polyglot~

      Sure, you can use Python 3 if you want. If you are the only one using the scripts, you control the environment, and you know everything will always be UTF8, then no worries. It's roughly equivalent to setting PERL_UNICODE=SDAL in your environment (see perldoc perlrun).

      Be sure to read the '𝔸 𝕤 𝕤 𝕦 𝕞 𝕖 𝔹 𝕣 𝕠 𝕜 𝕖 𝕟 𝕟 𝕖 𝕤 𝕤" section of Tom Christiansen's post from that I linked above, as nearly his entire post applies to any programming language.

      If the "brokeness" of Python is because it cannot handle non-UTF8 properly, that would not impact me at all, as everything I'm doing is with UTF8.

      I didn't say that Python 3 does not handle non-UTF8 properly, I only said that it considers (almost) everything UTF8 by default, which is not the same at all. You can certainly read/write files in almost any encoding you want, but you have to take an extra step to do so.

      Edit: I forgot to include a link to perldoc perlunicook (and the original on perl.com), which is more of Tom's writing on Perl and Unicode

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11156131]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (3)
As of 2024-05-26 05:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found