Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

perllocale weirdness, bug, or...?

by Krambambuli (Curate)
on Oct 20, 2010 at 13:58 UTC ( [id://866330]=perlquestion: print w/replies, xml ) Need Help??

Krambambuli has asked for the wisdom of the Perl Monks concerning the following question:

The following code
#!/usr/bin/perl use strict; use warnings; my $s1 = 'aaa2000@yahoo.com'; my $s2 = 'aaa_2000@yahoo.com'; my $s3 = 'aaa2000'; my $s4 = 'aaa_2000'; no locale; print "\nNO Locale:\n\n"; if ($s1 gt $s2) {print "$s1 is > $s2\n";} if ($s1 lt $s2) {print "$s1 is < $s2\n";} if ($s1 eq $s2) {print "$s1 is = $s2\n";} if ($s3 gt $s4) {print "$s3 is > $s4\n";} if ($s3 lt $s4) {print "$s3 is < $s4\n";} if ($s3 eq $s4) {print "$s3 is = $s4\n";} use locale; print "\nWith 'use locale;':\n\n"; if ($s1 gt $s2) {print "$s1 is > $s2\n";} if ($s1 lt $s2) {print "$s1 is < $s2\n";} if ($s1 eq $s2) {print "$s1 is = $s2\n";} if ($s3 gt $s4) {print "$s3 is > $s4\n";} if ($s3 lt $s4) {print "$s3 is < $s4\n";} if ($s3 eq $s4) {print "$s3 is = $s4\n";}
prints out

NO Locale:

aaa2000@yahoo.com is < aaa_2000@yahoo.com
aaa2000 is < aaa_2000

With 'use locale;':

aaa2000@yahoo.com is > aaa_2000@yahoo.com
aaa2000 is < aaa_2000
which I cannot really follow. Am I missing something more or less obvious, or is this a bug? Can others confirm to see the same behavior ?

I see this both on a 5.8.8 Perl as on a 5.10.1 Perl.

Many thanks in advance,

Krambambuli
---

Replies are listed 'Best First'.
Re: perllocale weirdness, bug, or...?
by Corion (Patriarch) on Oct 20, 2010 at 14:04 UTC

    See locale. Using locale changes your sort order to whatever is considered "natural" for the locale you have set up. I would avoid it, but I guess you can find out what locale is active and if you still want to use it, you can set your locale to 'C' for the time where you want the "usual" sort/string comparison behaviour of Perl.

      I've checked with perllocale, but I just can't find any sense:

      how is it possible to have a > b _and_ in the same time a@yahoo.com < b@yahoo.com...?


      Krambambuli
      ---

        Oh - I hadn't seen that contradiction that runs counter to the intuition that "strings comparing larger" should compare starting from the left. I'm not sure what locale you actually run under. Maybe somebody who has actual working experience with locales can tell from $ENV{LC_ALL} or $ENV{LC_COLLATE} or $ENV{LANG} (see perllocale) what the active locale for your system is and how it affects sorting.

        I would still avoid locales, exactly because they introduce hard to track down behaviour.

        Well what is your locale?
Re: perllocale weirdness, bug, or...?
by thundergnat (Deacon) on Oct 20, 2010 at 18:23 UTC

    It is almost definitely a locale weirdness thing. To get an idea of your local(e) sort order, try running the following: It's probably not what you might suspect.

    #!/usr/bin/perl use strict; use warnings; { no locale; print "\nNO Locale:\n\n"; print +(join ' ', sort grep /\w/, map { chr } 0..255), "\n"; } { use locale; print "\nWith 'use locale;':\n\n"; print +(join ' ', sort grep /\w/, map { chr } 0..255), "\n"; }
Re: perllocale weirdness, bug, or...?
by Krambambuli (Curate) on Oct 20, 2010 at 21:31 UTC
    Thanks - I've done this already, but it's not explaining the supposed non-sensical ordering I see.

    I've made some progress in the meantime however - it seems a problem with how exactly collate is done when LC_COLLATE = en_US.UTF-8 and not a Perl problem. But I'm still have to understand how it comes that a sort with this collation gives
    _
    2
    a
    a2
    a_2
    a_2.
    a2.

    instead of what I would feel as 'logical' to be
    _
    2
    a
    a_2
    a_2.
    a2
    a2.
    ---
    (Update): sorry, misplaced this answer, it should have been a reply to thundergnat's note.
      if using unicode (utf-8) as locale, also make sure your data, including input and output streams match. as you might still be experiencing certain "bits" in the whole setup not being utf-8. also check your locale settings at system/root level, and compare with the logged in user. i've seen weirdness come from that. also make sure there's no special other perl only system variables set, that could interfere.
      the hardest line to type correctly is: stty erase ^H

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://866330]
Approved by talexb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2024-03-19 02:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found