Re: Unaccenting characters


laziness, impatience, and hubris
	PerlMonks

Re: Unaccenting characters

by moritz (Cardinal)

on Aug 28, 2013 at 19:06 UTC ( [id://1051312]=note: print w/replies, xml )

Need Help??

in reply to Unaccenting characters

I'm not a big fan of such big tables, so instead I'd propose this:

use 5.010;
use strict;
use warnings;
use utf8;
use Unicode::Normalize qw/NFKD/;

sub unaccent {
    my $s = NFKD shift;
    $s =~ s/\pM//g;
    return $s;
}

say unaccent "Les Misérables";
__END__
Output:
Les Miserables
[download]

The NFD normalization form has the base character and the accent split into two different characters, and the substitution removes all the marks (\pM).

(And Unicode::Normalize is a core module since perl 5.8, and you really, really don't want to use anything older than that for Unicode stuff).

Perl 6 - the future is here, just unevenly distributed

Comment on Re: Unaccenting characters Select or Download Code

Replies are listed 'Best First'.
Re^2: Unaccenting characters by mwhiting (Beadle) on Aug 29, 2013 at 16:37 UTC
Thanks, I will try that. What is the 'shift' supposed to do in the code. I know what it does in general, but it was in the original code, and now here, and I don't quite see how it fits in.	[reply]
Re^3: Unaccenting characters by moritz (Cardinal) on Aug 29, 2013 at 17:36 UTC
shift without an argument obtains the first element of the argument list of the subroutine, so it fetches the string that is passed to the subroutine. Perl 6 - the future is here, just unevenly distributed	[reply]

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://1051312]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others browsing the Monastery: (3)

As of 2024-04-19 21:25 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found