Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^2: Unicode: Perl5 equivalent to Perl6's @string.graphemes?

by ikegami (Pope)
on Nov 12, 2010 at 22:30 UTC ( #871154=note: print w/ replies, xml ) Need Help??


in reply to Re: Unicode: Perl5 equivalent to Perl6's @string.graphemes?
in thread Unicode: Perl5 equivalent to Perl6's @string.graphemes?

Hm. Are you sure about \p{M}?

Yes. /\P{M}\p{M}*/ is a poor man's version of (only recently available) /\X/. The idea is to match what the reader would consider a character. These are called "graphemes". Graphemes can be formed by more than one Unicode code points. For example, this instance of grapheme "é" is composed using code points U+0065 (LATIN SMALL LETTER E) plus U+0301 (COMBINING ACUTE ACCENT). U+0065 matches /\P{M}/, and U+0301 matches /\p{M}/.

He simply needs to apply the regex pattern against the decoded text (as his commented out code would do) rather than apply the regexp against the UTF-8 bytes that represent the text.

Also make sure that edict is really in UTF-8.

It surely is since he got a U+FF11 (FULLWIDTH DIGIT ONE, a "1" as wide as a Japanese character) when treating the input as UTF-8.


Comment on Re^2: Unicode: Perl5 equivalent to Perl6's @string.graphemes?
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://871154]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (16)
As of 2015-07-28 14:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (256 votes), past polls