Perl Monk, Perl Meditation | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
Well, actually recently I had experience writing not a small application (20k lines), which allows unicode everywhere and handles unicode correctly.
But it does not need 90% of things that you listed Probably because my application does not try to analyze text data (it only stores it, converts, compares, reencodes), it does not need sort nor fc-aware comparison Your case is probably something that analyzes text (I can imagine now only something related to natural language processing or word processor or maybe a dictonary) So I think different applications need different level of unicode support Below some cases when policy you listed can be wrong in some circumstances: lc($a) cmp/eq/ne/... lc($b) should be using fc. Same story with uc. Something like a-z should often be \p{Ll} or \p{lower}If you write, say, code which have to deal with parsing http headers (no, that's not reinvention of wheel, like HTTP library, that can be a proxy server or REST library), then "cmp" and "a-z" would be correct choice, and fc() \p{lower} can introduce bugs (say, with "β" vs "ss"). Other examples can be unit tests where you usually have to deal with pre-defined data sets, or internal program metadata which is always plain ASCII, or comparison of MD5/SHA hex values etc. Opening a text file without stating its encoding somewhere or other is a recipe for failure.Unless it's a binary file. @lines = do { local $/; split /\R/, <INPUT> };Hm. I think it's not correct to use something like U+2028 as line separator for files. You need code like this if you read from text file. Text file is something separated by LF or CRLF, other combinations are not portable. If you are writing word processor which should handle U+2028 you should not mix this with system file IO, instead introduce your own logic when you are spliting data to "lines" and paragraphs. I don't see where this can be correct to mix "lines" from your word processor logic and lines of text file on disk (or socket) In reply to Re^3: Where are the Perl::Critic Unicode policies for the Camel?
by vsespb
|
|