Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

removing accent

by jeteve (Pilgrim)
on Aug 22, 2005 at 13:16 UTC ( #485681=perlquestion: print w/replies, xml ) Need Help??
jeteve has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks !

I'd like to know how to remove accent characters from a string. Replacing'em with accent free ones.

Thx for help !

-- Nice photos of naked perl sources here !

Replies are listed 'Best First'.
Re: removing accent
by socketdave (Curate) on Aug 22, 2005 at 13:24 UTC
    Text::Unidecode... This hasn't been updated in quite a while, but it should be a simple way to do what you're asking for.
      Thx a lot. That should fix my problem !

      -- Nice photos of naked perl sources here !

Re: removing accent
by zentara (Archbishop) on Aug 22, 2005 at 15:03 UTC
    A snippet I had laying around, from some lost node:
    #!/usr/bin/perl use warnings; use strict; use Unicode::Normalize; use Encode; my $string = "+lsctz}"; print "$string\n"; $string = decode("windows-1250", $string); $string = NFD($string); $string =~ s/\pM//og; print "$string\n";

    I'm not really a human, but I play one on earth. flash japh
      Huh. When I had to do this recently, I used NFKD. I guess from codepage 1250 there's no difference between the two, but for codepage 1252, there is - NFKD squashes superscripted 2s and 3s to regular 2s and 3s, and changes "½" to "1/2".

      Of course, whether or not such a squashing is desireable will depend on the application.

      -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/
Re: removing accent
by polypompholyx (Chaplain) on Aug 22, 2005 at 13:30 UTC
    You will probably want a string substitution of some sort, using s//a/g (or s/\x{e4}/a/g, where \x{NNNNN} is the accented character's Unicode code point). tr//a/ might be a better choice, but this depends on exactly what you're trying to do. However, here be dragons: perldoc perluniintro and related docs for the gory details of encoding schemes, Unicode and locales.
Re: removing accent
by ysth (Canon) on Aug 22, 2005 at 23:17 UTC
      Ouch.

      I mean, okay, munging the name of the letter is a cute hack, but there's a much, much easier way if you've got the Unicode::Normalize module available to you - just ask for the NFD or NFKD of the string, and then do a regular expression substitution to eliminate combining marks: s/\pM//g

      -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://485681]
Approved by holli
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (9)
As of 2017-03-24 09:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Should Pluto Get Its Planethood Back?



    Results (298 votes). Check out past polls.