Lowercase and normalize a unicode string

mugwumpjism has asked for the wisdom of the Perl Monks concerning the following question:

Is there an easy way to normalize and lowercase a unicode string? This works for probably most Latin languages:

use Unicode::Normalize;
my $lower_nfc = NFC(lc(NFD($string)));
[download]

However this doesn't work for, say, Greek. It seems that the Perl Unicode API would require this:

use Unicode::UCD qw(charinfo);
use Unicode::Normalize;
my $nfd_string = NFD($string);
$nfd_string =~ s{(\p{Lu})}{chr(hex(charinfo(ord($1))->{lower}))}ge;
my $nfc_string = NFC($nfd_string);
[download]

Surely there's an easier way...

$h=$ENV{HOME};my@q=split/\n\n/,`cat $h/.quotes`;$s="$h/."
."signature";$t=`cat $s`;print$t,"\n",$q[rand($#q)],"\n";
[download]

Comment on Lowercase and normalize a unicode string Select or Download Code

Replies are listed 'Best First'.
Re: Lowercase and normalize a unicode string by ikegami (Patriarch) on Nov 02, 2010 at 04:51 UTC
`lc` alone should do. You shouldn't have to use normalize, and it should work for all scripts. Make sure you've decoded the text. If you're unlucky, you may have to use one of the following: `utf8::upgrade($s); lc($s)` [download] or `use feature qw( unicode_strings ); lc($s)` [download] If you're still have problems, please provide a sample string. Preferably using `use Data::Dumper; local $Data::Dumper::Useqq = 1; print(Dumper($s));` [download] or `use Devel::Peek; Dump($s);` [download]	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: Lowercase and normalize a unicode string
by ikegami (Patriarch) on Nov 02, 2010 at 04:51 UTC

lc alone should do. You shouldn't have to use normalize, and it should work for all scripts. Make sure you've decoded the text.

If you're unlucky, you may have to use one of the following:

utf8::upgrade($s);
lc($s)
[download]

use feature qw( unicode_strings );
lc($s)
[download]

If you're still have problems, please provide a sample string. Preferably using

use Data::Dumper;
local $Data::Dumper::Useqq = 1;
print(Dumper($s));
[download]

use Devel::Peek;
Dump($s);
[download]

[reply]
[d/l]
[select]

Back to Seekers of Perl Wisdom