http://www.perlmonks.org?node_id=1053786

mwhiting has asked for the wisdom of the Perl Monks concerning the following question:

How do I case shift accented characters? I want to take "LES MISÉRABLES" and change it to "LES MISéRABLES". This is so that I can do a regex comparison against that string. I don't need to shift the rest of the characters because I can do a case insensitive comparison on the rest of it (\\i), but that doesn't work on the accented characters.

I tried the lc function, but it just gives me "LES MISRABLES"

Replies are listed 'Best First'.
Re: Case shifting on accented characters
by ikegami (Patriarch) on Sep 12, 2013 at 20:50 UTC

    //i does work on accented characters ...usually. When it doesn't, you can force it to using one of the following methods:

    A very likely possibility is that you don't actually have "é" or "É" in your string or in your code due to forgetting to decode, since you don't normally need the above.

    use utf8; # Source file is encoded using UTF-8 print "é" =~ /É/i ?1:0,"\n"; # 1 print "É" =~ /é/i ?1:0,"\n"; # 1 print "é" =~ /\w/ ?1:0,"\n"; # 1 print "É" =~ /\w/ ?1:0,"\n"; # 1

    To answer your question, you could go about doing that by lowercasing non-ASCII characters using s/([^\x00-\x7F])/lc($1)/eg; with one of the above used.

    use utf8; # UTF-8 code use open ':std', ':encoding(UTF-8)'; # UTF-8 terminal use 5.012; $_ = "LES MISÉRABLES"; s/([^\x00-\x7F])/lc($1)/eg; # LES MISéRABLES say;
Re: Case shifting on accented characters (casefold, fc)
by Anonymous Monk on Sep 13, 2013 at 08:43 UTC