removing accent

by jeteve (Pilgrim)
on Aug 22, 2005 at 13:16 UTC
jeteve has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks !

I'd like to know how to remove accent characters from a string. Replacing'em with accent free ones.

Re: removing accent
by socketdave (Curate) on Aug 22, 2005 at 13:24 UTC
    Text::Unidecode... This hasn't been updated in quite a while, but it should be a simple way to do what you're asking for.
Re: removing accent
by zentara (Archbishop) on Aug 22, 2005 at 15:03 UTC
    A snippet I had laying around, from some lost node:
    #!/usr/bin/perl use warnings; use strict; use Unicode::Normalize; use Encode; my $string = "+lsctz}"; print "$string\n"; $string = decode("windows-1250", $string); $string = NFD($string); $string =~ s/\pM//og; print "$string\n";

      Huh. When I had to do this recently, I used NFKD. I guess from codepage 1250 there's no difference between the two, but for codepage 1252, there is - NFKD squashes superscripted 2s and 3s to regular 2s and 3s, and changes "½" to "1/2".

      Of course, whether or not such a squashing is desireable will depend on the application.

Re: removing accent
by polypompholyx (Chaplain) on Aug 22, 2005 at 13:30 UTC
    You will probably want a string substitution of some sort, using s//a/g (or s/\x{e4}/a/g, where \x{NNNNN} is the accented character's Unicode code point). tr//a/ might be a better choice, but this depends on exactly what you're trying to do. However, here be dragons: perldoc perluniintro and related docs for the gory details of encoding schemes, Unicode and locales.
Re: removing accent
by ysth (Canon) on Aug 22, 2005 at 23:17 UTC

      I mean, okay, munging the name of the letter is a cute hack, but there's a much, much easier way if you've got the Unicode::Normalize module available to you - just ask for the NFD or NFKD of the string, and then do a regular expression substitution to eliminate combining marks: s/\pM//g

