Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Unaccenting characters

by moritz (Cardinal)
on Aug 28, 2013 at 19:06 UTC ( #1051312=note: print w/ replies, xml ) Need Help??


in reply to Unaccenting characters

I'm not a big fan of such big tables, so instead I'd propose this:

use 5.010; use strict; use warnings; use utf8; use Unicode::Normalize qw/NFKD/; sub unaccent { my $s = NFKD shift; $s =~ s/\pM//g; return $s; } say unaccent "Les MisÚrables"; __END__ Output: Les Miserables

The NFD normalization form has the base character and the accent split into two different characters, and the substitution removes all the marks (\pM).

(And Unicode::Normalize is a core module since perl 5.8, and you really, really don't want to use anything older than that for Unicode stuff).


Comment on Re: Unaccenting characters
Select or Download Code
Re^2: Unaccenting characters
by mwhiting (Beadle) on Aug 29, 2013 at 16:37 UTC

    Thanks, I will try that. What is the 'shift' supposed to do in the code. I know what it does in general, but it was in the original code, and now here, and I don't quite see how it fits in.

Reaped: Re^2: Unaccenting characters
by NodeReaper (Curate) on Aug 29, 2013 at 16:38 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1051312]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (14)
As of 2014-09-22 16:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (198 votes), past polls