Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Unaccenting characters

by moritz (Cardinal)
on Aug 28, 2013 at 19:06 UTC ( #1051312=note: print w/ replies, xml ) Need Help??


in reply to Unaccenting characters

I'm not a big fan of such big tables, so instead I'd propose this:

use 5.010; use strict; use warnings; use utf8; use Unicode::Normalize qw/NFKD/; sub unaccent { my $s = NFKD shift; $s =~ s/\pM//g; return $s; } say unaccent "Les MisÚrables"; __END__ Output: Les Miserables

The NFD normalization form has the base character and the accent split into two different characters, and the substitution removes all the marks (\pM).

(And Unicode::Normalize is a core module since perl 5.8, and you really, really don't want to use anything older than that for Unicode stuff).


Comment on Re: Unaccenting characters
Select or Download Code
Replies are listed 'Best First'.
Re^2: Unaccenting characters
by mwhiting (Beadle) on Aug 29, 2013 at 16:37 UTC

    Thanks, I will try that. What is the 'shift' supposed to do in the code. I know what it does in general, but it was in the original code, and now here, and I don't quite see how it fits in.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1051312]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (9)
As of 2015-07-28 10:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (254 votes), past polls