Your skill will accomplishwhat the force of many cannot PerlMonks

### perllocale weirdness, bug, or...?

by Krambambuli (Curate)
 on Oct 20, 2010 at 13:58 UTC Need Help??
Krambambuli has asked for the wisdom of the Perl Monks concerning the following question:

The following code
```#!/usr/bin/perl

use strict;
use warnings;

my \$s1 = 'aaa2000@yahoo.com';
my \$s2 = 'aaa_2000@yahoo.com';
my \$s3 = 'aaa2000';
my \$s4 = 'aaa_2000';

no locale;

print "\nNO Locale:\n\n";

if (\$s1 gt \$s2) {print "\$s1 is > \$s2\n";}
if (\$s1 lt \$s2) {print "\$s1 is < \$s2\n";}
if (\$s1 eq \$s2) {print "\$s1 is = \$s2\n";}

if (\$s3 gt \$s4) {print "\$s3 is > \$s4\n";}
if (\$s3 lt \$s4) {print "\$s3 is < \$s4\n";}
if (\$s3 eq \$s4) {print "\$s3 is = \$s4\n";}

use locale;

print "\nWith 'use locale;':\n\n";

if (\$s1 gt \$s2) {print "\$s1 is > \$s2\n";}
if (\$s1 lt \$s2) {print "\$s1 is < \$s2\n";}
if (\$s1 eq \$s2) {print "\$s1 is = \$s2\n";}

if (\$s3 gt \$s4) {print "\$s3 is > \$s4\n";}
if (\$s3 lt \$s4) {print "\$s3 is < \$s4\n";}
if (\$s3 eq \$s4) {print "\$s3 is = \$s4\n";}
prints out
```
NO Locale:

aaa2000@yahoo.com is < aaa_2000@yahoo.com
aaa2000 is < aaa_2000

With 'use locale;':

aaa2000@yahoo.com is > aaa_2000@yahoo.com
aaa2000 is < aaa_2000
```
which I cannot really follow. Am I missing something more or less obvious, or is this a bug? Can others confirm to see the same behavior ?

I see this both on a 5.8.8 Perl as on a 5.10.1 Perl.

Krambambuli
---

Replies are listed 'Best First'.
Re: perllocale weirdness, bug, or...?
by Corion (Pope) on Oct 20, 2010 at 14:04 UTC

See locale. Using locale changes your sort order to whatever is considered "natural" for the locale you have set up. I would avoid it, but I guess you can find out what locale is active and if you still want to use it, you can set your locale to 'C' for the time where you want the "usual" sort/string comparison behaviour of Perl.

I've checked with perllocale, but I just can't find any sense:

how is it possible to have a > b _and_ in the same time a@yahoo.com < b@yahoo.com...?

Krambambuli
---

Oh - I hadn't seen that contradiction that runs counter to the intuition that "strings comparing larger" should compare starting from the left. I'm not sure what locale you actually run under. Maybe somebody who has actual working experience with locales can tell from \$ENV{LC_ALL} or \$ENV{LC_COLLATE} or \$ENV{LANG} (see perllocale) what the active locale for your system is and how it affects sorting.

I would still avoid locales, exactly because they introduce hard to track down behaviour.

Re: perllocale weirdness, bug, or...?
by thundergnat (Deacon) on Oct 20, 2010 at 18:23 UTC

It is almost definitely a locale weirdness thing. To get an idea of your local(e) sort order, try running the following: It's probably not what you might suspect.

```#!/usr/bin/perl

use strict;
use warnings;

{
no locale;
print "\nNO Locale:\n\n";
print +(join ' ', sort grep /\w/, map { chr } 0..255), "\n";
}

{
use locale;
print "\nWith 'use locale;':\n\n";
print +(join ' ', sort grep /\w/, map { chr } 0..255), "\n";
}
Re: perllocale weirdness, bug, or...?
by Krambambuli (Curate) on Oct 20, 2010 at 21:31 UTC
Thanks - I've done this already, but it's not explaining the supposed non-sensical ordering I see.

I've made some progress in the meantime however - it seems a problem with how exactly collate is done when LC_COLLATE = en_US.UTF-8 and not a Perl problem. But I'm still have to understand how it comes that a sort with this collation gives
_
2
a
a2
a_2
a_2.
a2.

instead of what I would feel as 'logical' to be
_
2
a
a_2
a_2.
a2
a2.
---
(Update): sorry, misplaced this answer, it should have been a reply to thundergnat's note.
if using unicode (utf-8) as locale, also make sure your data, including input and output streams match. as you might still be experiencing certain "bits" in the whole setup not being utf-8. also check your locale settings at system/root level, and compare with the logged in user. i've seen weirdness come from that. also make sure there's no special other perl only system variables set, that could interfere.
the hardest line to type correctly is: stty erase ^H

Create A New User
Node Status?
node history
Node Type: perlquestion [id://866330]
Approved by talexb
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (9)
As of 2018-04-25 15:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
My travels bear the most uncanny semblance to ...

Results (91 votes). Check out past polls.

Notices?