http://www.perlmonks.org?node_id=11156131


in reply to Re^11: Converting Unicode
in thread Converting Unicode

Perl assumes everything (except newlines) is bytes unless you tell it otherwise. Python 2 did the same. Python 3 assumes (almost) everything is utf-8 unless you tell it otherwise. Of the three, Python 3 is arguably the most broken. I can attest to this having to occasionally work with python and ISO-8859-15 files.

Tom Christiansen's answer on Stack Overflow seems to be the definitive answer to why perl doesn't do it this way. perldoc perluniintro, perlunitut, perlunifaq, and perlunicode should give you most of what you want to know about unicode in perl.

Replies are listed 'Best First'.
Re^13: Converting Unicode
by Polyglot (Chaplain) on Dec 06, 2023 at 10:29 UTC
    Are you implying that if one is using nothing other than UTF8 (i.e. no need for ISO-8859-15), Python 3 might actually handle just fine?

    If the "brokeness" of Python is because it cannot handle non-UTF8 properly, that would not impact me at all, as everything I'm doing is with UTF8.

    Blessings,

    ~Polyglot~

      Sure, you can use Python 3 if you want. If you are the only one using the scripts, you control the environment, and you know everything will always be UTF8, then no worries. It's roughly equivalent to setting PERL_UNICODE=SDAL in your environment (see perldoc perlrun).

      Be sure to read the '𝔸 𝕤 𝕤 𝕦 𝕞 𝕖 𝔹 𝕣 𝕠 𝕜 𝕖 𝕟 𝕟 𝕖 𝕤 𝕤" section of Tom Christiansen's post from that I linked above, as nearly his entire post applies to any programming language.

      If the "brokeness" of Python is because it cannot handle non-UTF8 properly, that would not impact me at all, as everything I'm doing is with UTF8.

      I didn't say that Python 3 does not handle non-UTF8 properly, I only said that it considers (almost) everything UTF8 by default, which is not the same at all. You can certainly read/write files in almost any encoding you want, but you have to take an extra step to do so.

      Edit: I forgot to include a link to perldoc perlunicook (and the original on perl.com), which is more of Tom's writing on Perl and Unicode