Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

am i also correct in assuming that perl automatically writes data as ISO-8859-1?

Not really. Perl outputs using whatever encoding you specify (via use open, binmode or some other means).

If you don't specify, it outputs the internal representation of the string which is either arbitrary bytes of unknown encoding (UTF8 flag off) or a lax variant of UTF-8 called utf8 (UTF8 flag on). If the UTF8 flag is on, you might also get a warning.

If you happen to pass iso-latin-1 characters to Perl and you print these out, Perl will output iso-latin-1. But the same goes for any encoding.

# U+00E9 LATIN SMALL LETTER E WITH ACUTE # Second perl outputs iso-8859-1 $ perl -e'use open ":std", ":encoding(iso-8859-1)"; print chr(0x00E9)' + | perl -e"print <>" | od -t x1 0000000 e9 0000001 # U+0449 CYRILLIC SMALL LETTER SHCHA # Second perl outputs iso-8859-5 $ perl -e'use open ":std", ":encoding(iso-8859-5)"; print chr(0x0449)' + | perl -e"print <>" | od -t x1 0000000 e9 0000001

However, many aspects of Perl will presume the arbitrary bytes of unknown encoding are iso-latin-1. This includes uc, regexp character classes such as \w, explicit upgrades to utf8 (utf8::upgrade($_)), and implicit upgrades to utf8 (chop( $_ . chr(0x2660) )).

In reply to Re^4: Character encoding of microns by ikegami
in thread Character encoding of microns by joec_

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
Domain Nodelet?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-06-18 21:11 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.