Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^11: Converting Unicode

by Polyglot (Chaplain)
on Dec 06, 2023 at 02:27 UTC ( [id://11156128]=note: print w/replies, xml ) Need Help??


in reply to Re^10: Converting Unicode
in thread Converting Unicode

Thank you so much for chiming in here. Perl is not fully unicode compatible yet...but people who don't use unicode regularly, particularly Asian scripts, will likely be oblivious to this and unable to understand the situation. Your points are valid and need more attention.

I attended a week of Python training last year. At the time I smugly felt Perl to be superior in many ways. Now I'm wondering if I should pursue it more seriously. Python does have its advantages, even if I feel bothered by its strict formatting rules.

Blessings,

~Polyglot~

Replies are listed 'Best First'.
Re^12: Converting Unicode
by jeffenstein (Hermit) on Dec 06, 2023 at 09:28 UTC

    Perl assumes everything (except newlines) is bytes unless you tell it otherwise. Python 2 did the same. Python 3 assumes (almost) everything is utf-8 unless you tell it otherwise. Of the three, Python 3 is arguably the most broken. I can attest to this having to occasionally work with python and ISO-8859-15 files.

    Tom Christiansen's answer on Stack Overflow seems to be the definitive answer to why perl doesn't do it this way. perldoc perluniintro, perlunitut, perlunifaq, and perlunicode should give you most of what you want to know about unicode in perl.

      Are you implying that if one is using nothing other than UTF8 (i.e. no need for ISO-8859-15), Python 3 might actually handle just fine?

      If the "brokeness" of Python is because it cannot handle non-UTF8 properly, that would not impact me at all, as everything I'm doing is with UTF8.

      Blessings,

      ~Polyglot~

        Sure, you can use Python 3 if you want. If you are the only one using the scripts, you control the environment, and you know everything will always be UTF8, then no worries. It's roughly equivalent to setting PERL_UNICODE=SDAL in your environment (see perldoc perlrun).

        Be sure to read the '𝔸 𝕤 𝕤 𝕦 𝕞 𝕖 𝔹 𝕣 𝕠 𝕜 𝕖 𝕟 𝕟 𝕖 𝕤 𝕤" section of Tom Christiansen's post from that I linked above, as nearly his entire post applies to any programming language.

        If the "brokeness" of Python is because it cannot handle non-UTF8 properly, that would not impact me at all, as everything I'm doing is with UTF8.

        I didn't say that Python 3 does not handle non-UTF8 properly, I only said that it considers (almost) everything UTF8 by default, which is not the same at all. You can certainly read/write files in almost any encoding you want, but you have to take an extra step to do so.

        Edit: I forgot to include a link to perldoc perlunicook (and the original on perl.com), which is more of Tom's writing on Perl and Unicode

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11156128]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2024-06-12 20:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.