Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Spanish locale and name sorting

by Anonymous Monk
on May 02, 2009 at 01:30 UTC ( [id://761449]=note: print w/replies, xml ) Need Help??


in reply to Spanish locale and name sorting

We Spanish-speaking people would expect the sorted list to come out with all "mac something" and all "san something" before "maceira" and "sangregorio" respectively, making these longer words come out after surnames whose first word is shorter ("mac" and "san").

Maybe you expect wrong thing?

#!/usr/bin/perl -- use strict; use warnings; localsort("C"); localsort("Spanish - Argentina"); sub localsort { use POSIX qw(setlocale LC_CTYPE); my( $wantlocale ) = @_; my $curlocale = setlocale(LC_CTYPE); my $setlocale = setlocale(LC_CTYPE,$wantlocale); if( not $setlocale ){ print "Couldn't switch locale from ($curlocale) to ($wantlocal +e).\n"; } else { print "Current locale is ($setlocale).\n"; my @list = ('maceira', 'mac alister', 'mac loughlin', 'san esteban', 'sangregorio', 'san zoilo'); my @yes = do { use locale; sort @list; }; my @no = do { no locale; sort @list; }; printf " %-20s %-20s %-20s\n", qw[unsorted use-locale no-lo +cale ]; print '- ' x 33,"\n"; for my $i( 0 .. $#list ){ printf "%3d %-20s %-20s %-20s\n", $i, $list[$i], $yes[$i], + $no[$i]; } } print '- ' x 33,"\n"; setlocale(LC_CTYPE,$curlocale);#restore } __END__ Current locale is (C). unsorted use-locale no-locale - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0 maceira mac alister mac alister 1 mac alister mac loughlin mac loughlin 2 mac loughlin maceira maceira 3 san esteban san esteban san esteban 4 sangregorio san zoilo san zoilo 5 san zoilo sangregorio sangregorio - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Current locale is (Spanish_Spain.1252). unsorted use-locale no-locale - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0 maceira mac alister mac alister 1 mac alister mac loughlin mac loughlin 2 mac loughlin maceira maceira 3 san esteban san esteban san esteban 4 sangregorio san zoilo san zoilo 5 san zoilo sangregorio sangregorio - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Replies are listed 'Best First'.
Re^2: Spanish locale and name sorting
by Jorge_de_Burgos (Beadle) on May 02, 2009 at 11:40 UTC
    Maybe you expect wrong thing?

    Why would you say that? The output of your code on your system shows that our expectations are right -- if you use some 1252 (I think that means Windows) locale instead of UTF-8.

    This is the output of your program on my system.

    Current locale is (C). unsorted use-locale no-locale - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0 maceira mac alister mac alister 1 mac alister maceira mac loughlin 2 mac loughlin mac loughlin maceira 3 san esteban san esteban san esteban 4 sangregorio sangregorio san zoilo 5 san zoilo san zoilo sangregorio - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Couldn't switch locale from (es_AR.UTF-8) to (Spanish - Argentina). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    I am looking for a solution to a problem that arises under Spanish UTF-8 locales, where sorting order treats the space character as non existent.

      Down and dirty hacks are available of course. For everyday use I have come up with this:

      #!/usr/bin/perl use locale; my @list = ('maceira', 'mac alister', 'mac loughlin', 'san esteban', ' +sangregorio', 'san zoilo'); sub keeping_spaces { my $aa = $a; my $bb = $b; for ($aa) { tr/ /A/; } for ($bb) { tr/ /A/; } return $aa cmp $bb; } print "$_\n" for sort keeping_spaces @list;

      Which outputs what we would expect:

      mac alister mac loughlin maceira san esteban san zoilo sangregorio
      Why would you say that?
      Because I don't get the results you expect :) But then I don't have es_AR.UTF-8. Your results column for use-locale seems to ignores setlocale (because it doesn't match mine), but your no-locale column matches mine. I suspect a bug in locale. Can you try again with "es_AR.UTF-8" instead of "Spanish - Argentina"?
        Can you try again with "es_AR.UTF-8" instead of "Spanish - Argentina"?

        Sure thing. As you will see, with this output of your program, we are back to the situation I asked about in the first post of this thread:

        Current locale is (C). unsorted use-locale no-locale - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0 maceira mac alister mac alister 1 mac alister maceira mac loughlin 2 mac loughlin mac loughlin maceira 3 san esteban san esteban san esteban 4 sangregorio sangregorio san zoilo 5 san zoilo san zoilo sangregorio - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Current locale is (es_AR.UTF-8). unsorted use-locale no-locale - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0 maceira mac alister mac alister 1 mac alister maceira mac loughlin 2 mac loughlin mac loughlin maceira 3 san esteban san esteban san esteban 4 sangregorio sangregorio san zoilo 5 san zoilo san zoilo sangregorio - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        I suspect a bug in locale.

        So do I, if I am entitled to (I don't know anything about locale).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://761449]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (2)
As of 2024-04-24 23:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found