http://www.perlmonks.org?node_id=846207

sewa has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, I'm expecting the following code to simply lowercase Ü (using Perl 5.8.8):
use strict; use warnings; use locale; use POSIX qw(locale_h); + binmode( STDOUT, ":utf8" ); + my $loc = setlocale(LC_CTYPE); print "LC_CTYPE=$loc\n"; + my $accented_char = "\x{00dc}"; #Upper case U with DIAERESIS print "accented char=$accented_char\n"; + my $lowercased = lc( $accented_char ); + print "lowercased=$lowercased\n";
But it prints:
LC_CTYPE=en_US.UTF-8
accented char=Ü
lowercased=Ü
Based on perldoc for lc, I believe this should work, but it doesn't. Interestingly, accepting the input on stdin (with character encoding set to UTF-8 in the terminal) lowercases Ü correctly:
use strict; use warnings; use Encode; + binmode( STDIN, ":utf8" ); binmode( STDOUT, ":utf8" ); + + while( my $char = <> ) { chomp $char; my $lc_char = lc( $char ); print "lowercased $char=$lc_char\n"; }
Any idea as to why the first script wouldn't work? Many thanks.