Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Answer: How do I normalize (e.g. strip) diacritical märks from a Unicode string?

( #835241=categorized answer: print w/ replies, xml ) Need Help??

Q&A > strings > How do I normalize (e.g. strip) diacritical märks from a Unicode string? contributed by moritz

The trick is to split the letters with diacritical marks into the base letter and the mark, which Unicode::Normalize does with the NFD function. Then the regex /\pM/ identifies marking characters (see perlunicode).
use strict; use warnings; use utf8; use Unicode::Normalize; my $s = "söme stüff\n"; $s = NFD($s); $s =~ s/\pM//g; print $s;

Depending on the application, the NFKD might or might not be more appropriate than NFD.

The code snippet above removes all marking characters, not just diacritical marks. You can change that by removing only \x{308}. The following code strips the diacritical mark, but leaves the accents:

use strict; use warnings; use utf8; use Unicode::Normalize; binmode STDOUT, ':utf8'; my $s = "söme stüff with áccènts\n"; $s = NFD($s); $s =~ s/\x{308}//g; $s = NFC($s); print $s;

Comment on Answer: How do I normalize (e.g. strip) diacritical märks from a Unicode string?
Select or Download Code
Log In?
Username:
Password:

What's my password?
Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (16)
As of 2014-08-01 14:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Who would be the most fun to work for?















    Results (25 votes), past polls