Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
I'm sure that anyone getting started with unicode in perl will find your explanation useful -- nice post. But I think this part is a bit misleading:
Note, it is not so important which encoding is used by the "internal form". It can be any. Important is only that it is "internal", so it shouldn't be passed to external entities.

First, it actually is important that the "internal form" is (very much like) utf8 unicode. This means that ASCII characters actually are ASCII (single-byte) characters, while everything really is Unicode (*), so that:

  • the Unicode character properties work as expected in regular expressions
  • Unicode code point numerics (e.g. "\x{abcd}") can be used in regexes or double-quoted strings
  • character normalization works according to Unicode specifications (cf. Unicode::Normalize),
  • normal string sorting works according to the established Unicode code-point order
  • other collations (e.g. character sort ordering for particular languages) implement Unicode-based specifications (see various Unicode::Collate modules on CPAN).
All that stuff tends to make multi-language string processing a lot easier.

Second, as for passing "internal format" strings to "external entities", this isn't necessarily a problem. A "perl-internal" utf8 string can be passed for insertion into a database table via DBI without further ado, or printed directly to a file handle if the file was opened for output with the ":utf8" IO layer.

(* Update: well, the characters in the range U+0080 - U+00FF have some "special behaviors", but they really can be treated just like any other non-ASCII character.)


In reply to Re: text encodings and perl by graff
in thread text encodings and perl by andal

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-04-20 02:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found