Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Case shifting on accented characters

by mwhiting (Beadle)
on Sep 12, 2013 at 19:15 UTC ( #1053786=perlquestion: print w/ replies, xml ) Need Help??
mwhiting has asked for the wisdom of the Perl Monks concerning the following question:

How do I case shift accented characters? I want to take "LES MIS╔RABLES" and change it to "LES MISÚRABLES". This is so that I can do a regex comparison against that string. I don't need to shift the rest of the characters because I can do a case insensitive comparison on the rest of it (\\i), but that doesn't work on the accented characters.

I tried the lc function, but it just gives me "LES MISRABLES"

Comment on Case shifting on accented characters
Re: Case shifting on accented characters
by ikegami (Pope) on Sep 12, 2013 at 20:50 UTC

    //i does work on accented characters ...usually. When it doesn't, you can force it to using one of the following methods:

    A very likely possibility is that you don't actually have "Ú" or "╔" in your string or in your code due to forgetting to decode, since you don't normally need the above.

    use utf8; # Source file is encoded using UTF-8 print "Ú" =~ /╔/i ?1:0,"\n"; # 1 print "╔" =~ /Ú/i ?1:0,"\n"; # 1 print "Ú" =~ /\w/ ?1:0,"\n"; # 1 print "╔" =~ /\w/ ?1:0,"\n"; # 1

    To answer your question, you could go about doing that by lowercasing non-ASCII characters using s/([^\x00-\x7F])/lc($1)/eg; with one of the above used.

    use utf8; # UTF-8 code use open ':std', ':encoding(UTF-8)'; # UTF-8 terminal use 5.012; $_ = "LES MIS╔RABLES"; s/([^\x00-\x7F])/lc($1)/eg; # LES MISÚRABLES say;
Re: Case shifting on accented characters (casefold, fc)
by Anonymous Monk on Sep 13, 2013 at 08:43 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1053786]
Approved by Paladin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2014-07-24 03:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (156 votes), past polls