Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
No idea what that means.
RTFM then.
"By default, there is a fundamental asymmetry in Perl's Unicode model: implicit upgrading from byte strings to Unicode strings assumes that they were encoded in ISO 8859-1 (Latin-1)"
It's completely false nothing in Perl accepts or produces latin-1 and it has nothing to do with anything discussed so far.
LOL. It looks like Latin-1 and quacks like Latin-1, but it's not Latin-1. Yeah, it's just 'byte-packed subset of Unicode'.
"Whenever your encoded, binary string is used together with a text string, Perl will assume that your binary string was encoded with ISO-8859-1, also known as latin-1. If it wasn't latin-1, then your data is unpleasantly converted. For example, if it was UTF-8, the individual bytes of multibyte characters are seen as separate characters, and then again converted to UTF-8."
How about you 'fix' Perl's documentation, and then start arguing... It even talks about 'Unicode' and 'binary' strings (gasp).
my $x = chr(0x2660); my $y = chr(0xC3).chr(0xA9); $x . $y;
This is all the information Perl currently has. Is that an error?
Is that an error that perl -wE 'my $x = chr(0x00A9); say $x does one thing, and perl -wE 'my $y = chr(0x2660); say $y' does something else? I dunno. You tell me. intuitively, there should be no difference whatsoever, chr should be consistent, say should be consistent, everything should be... (confused) (not really).
This is all the information Perl currently has. Is that an error? You can't tell. Perl can't tell. Strings coming from a file handle with a decoding layer should be flagged "I'm decoded text!". Those coming from a file handle without a decoding layer should be flagged "I'm bytes!". Concatenating the two should be an error. These flags do not currently exist.
So you're not even disagreeing. You just hate the word 'Latin-1'. I'm done with you. Have a nice day.

In reply to Re^7: Database vs XML output representation of two-byte UTF-8 character by Anonymous Monk
in thread Database vs XML output representation of two-byte UTF-8 character by jkeenan1

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others cooling their heels in the Monastery: (7)
    As of 2021-04-12 16:39 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found

      Notices?