Clear questions and runnable code get the best and fastest answer |
|
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
Everything "looks" fine until you try to extract substrings in some way. That's because without decoding your data on input the strings are handled as sequences of bytes, so a character like ä translates to two bytes.
Now if you extract some part of string and didn't decoded it first, you can accidentally rip apart these two bytes, leaving behind encoding garbage - usually not a good idea. So I recommend to properly decode UTF-8 (and other character encodings) during input, and encode the strings on output. And use utf8; if you have string constants in your source code. In reply to Re^3: Modern best practices for multilingual regexp alphabetical character matching?
by moritz
|
|