Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Re: Convert international characters to plain ASCII

by graff (Chancellor)
on Apr 04, 2012 at 02:04 UTC ( #963348=note: print w/ replies, xml ) Need Help??

in reply to Convert international characters to plain ASCII

You want Unicode::Normalize, and you want to use the NFD() function to convert a string to its "canonical decomposition", which means that all the single-character code points that involve a letter plus a diacritic will be converted to the bare letter followed by the separate "combining form" version of the diacritic mark.

Once you have the string in that form, you get rid of the diacritic marks (leaving the letters in place) as follows:

(See the description of the "\p" regex options in perlunicode, perluniprops and perlre.)

Update: I forgot to mention -- even after taking care of the diacritic marks, be aware that you are likely to still have some non-ASCII characters left behind (i.e. things that don't involve an ASCII letter plus a diacritic mark, but are letter or punctuation that fall outside the ASCII range). You might need to tailor some ad-hoc replacements for those if you really need the data to be coherent in an ascii-only environment.

Comment on Re: Convert international characters to plain ASCII
Download Code

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://963348]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2015-11-29 00:32 GMT
Find Nodes?
    Voting Booth?

    What would be the most significant thing to happen if a rope (or wire) tied the Earth and the Moon together?

    Results (746 votes), past polls