Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Well, actually recently I had experience writing not a small application (20k lines), which allows unicode everywhere and handles unicode correctly.

But it does not need 90% of things that you listed

Probably because my application does not try to analyze text data (it only stores it, converts, compares, reencodes), it does not need sort nor fc-aware comparison

Your case is probably something that analyzes text (I can imagine now only something related to natural language processing or word processor or maybe a dictonary)

So I think different applications need different level of unicode support

Below some cases when policy you listed can be wrong in some circumstances:
lc($a) cmp/eq/ne/... lc($b) should be using fc. Same story with uc.
Something like a-z should often be \p{Ll} or \p{lower}
If you write, say, code which have to deal with parsing http headers (no, that's not reinvention of wheel, like HTTP library, that can be a proxy server or REST library), then "cmp" and "a-z" would be correct choice, and fc() \p{lower} can introduce bugs (say, with "β" vs "ss").

Other examples can be unit tests where you usually have to deal with pre-defined data sets, or internal program metadata which is always plain ASCII, or comparison of MD5/SHA hex values etc.
Opening a text file without stating its encoding somewhere or other is a recipe for failure.
Unless it's a binary file.

@lines = do { local $/; split /\R/, <INPUT> };
Hm. I think it's not correct to use something like U+2028 as line separator for files.

You need code like this if you read from text file. Text file is something separated by LF or CRLF, other combinations are not portable.

If you are writing word processor which should handle U+2028 you should not mix this with system file IO, instead introduce your own logic when you are spliting data to "lines" and paragraphs.

I don't see where this can be correct to mix "lines" from your word processor logic and lines of text file on disk (or socket)

In reply to Re^3: Where are the Perl::Critic Unicode policies for the Camel? by vsespb
in thread Where are the Perl::Critic policies for the Camel? by tchrist

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others chanting in the Monastery: (5)
    As of 2018-05-23 19:35 GMT
    Find Nodes?
      Voting Booth?