Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

removing accent

by jeteve (Pilgrim)
on Aug 22, 2005 at 13:16 UTC ( #485681=perlquestion: print w/replies, xml ) Need Help??
jeteve has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks !

I'd like to know how to remove accent characters from a string. Replacing'em with accent free ones.

Thx for help !

-- Nice photos of naked perl sources here !

Replies are listed 'Best First'.
Re: removing accent
by socketdave (Curate) on Aug 22, 2005 at 13:24 UTC
    Text::Unidecode... This hasn't been updated in quite a while, but it should be a simple way to do what you're asking for.
      Thx a lot. That should fix my problem !

      -- Nice photos of naked perl sources here !

Re: removing accent
by zentara (Archbishop) on Aug 22, 2005 at 15:03 UTC
    A snippet I had laying around, from some lost node:
    #!/usr/bin/perl use warnings; use strict; use Unicode::Normalize; use Encode; my $string = "+lsctz}"; print "$string\n"; $string = decode("windows-1250", $string); $string = NFD($string); $string =~ s/\pM//og; print "$string\n";

    I'm not really a human, but I play one on earth. flash japh
      Huh. When I had to do this recently, I used NFKD. I guess from codepage 1250 there's no difference between the two, but for codepage 1252, there is - NFKD squashes superscripted 2s and 3s to regular 2s and 3s, and changes "½" to "1/2".

      Of course, whether or not such a squashing is desireable will depend on the application.

      -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/
Re: removing accent
by polypompholyx (Chaplain) on Aug 22, 2005 at 13:30 UTC
    You will probably want a string substitution of some sort, using s//a/g (or s/\x{e4}/a/g, where \x{NNNNN} is the accented character's Unicode code point). tr//a/ might be a better choice, but this depends on exactly what you're trying to do. However, here be dragons: perldoc perluniintro and related docs for the gory details of encoding schemes, Unicode and locales.
Re: removing accent
by ysth (Canon) on Aug 22, 2005 at 23:17 UTC

      I mean, okay, munging the name of the letter is a cute hack, but there's a much, much easier way if you've got the Unicode::Normalize module available to you - just ask for the NFD or NFKD of the string, and then do a regular expression substitution to eliminate combining marks: s/\pM//g

      -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://485681]
Approved by holli
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2018-05-27 21:57 GMT
Find Nodes?
    Voting Booth?