Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^3: perllocale weirdness, bug, or...?

by Corion (Patriarch)
on Oct 20, 2010 at 14:21 UTC ( [id://866337]=note: print w/replies, xml ) Need Help??


in reply to Re^2: perllocale weirdness, bug, or...?
in thread perllocale weirdness, bug, or...?

Oh - I hadn't seen that contradiction that runs counter to the intuition that "strings comparing larger" should compare starting from the left. I'm not sure what locale you actually run under. Maybe somebody who has actual working experience with locales can tell from $ENV{LC_ALL} or $ENV{LC_COLLATE} or $ENV{LANG} (see perllocale) what the active locale for your system is and how it affects sorting.

I would still avoid locales, exactly because they introduce hard to track down behaviour.

Replies are listed 'Best First'.
Re^4: perllocale weirdness, bug, or...?
by Krambambuli (Curate) on Oct 20, 2010 at 14:32 UTC
    The problem is that I want to _reproduce_ under Perl the same order relation that the system level sort uses, as I want to process under Perl a file sorted outside Perl with sort and I rely on the strict ordering in the file to be conform with what Perl uses.

    Krambambuli
    ---

      The problem is that I want to _reproduce_ under Perl the same order relation that the system level sort uses

      What problem? You should have checked what order the system's sort uses.

      $ cat >data aaa2000@yahoo.com aaa_2000@yahoo.com aaa2000 aaa_2000 $ export LC_COLLATE=en_US.UTF-8 $ sort data aaa2000 aaa_2000 aaa_2000@yahoo.com aaa2000@yahoo.com $ perl -le'use locale; chomp(@a=<>); print for sort @a;' data aaa2000 aaa_2000 aaa_2000@yahoo.com aaa2000@yahoo.com $ export LC_COLLATE=C $ sort data aaa2000 aaa2000@yahoo.com aaa_2000 aaa_2000@yahoo.com $ perl -le'use locale; chomp(@a=<>); print for sort @a;' data aaa2000 aaa2000@yahoo.com aaa_2000 aaa_2000@yahoo.com

      Whether the order makes sense or not, it's doing exactly what you want.

        Thank you; indeed, Perl has no guilt.

        As I was trying to track down a slightly more complicated issue, I just was on the wrong track for a while.

        Looks like _my_ real problem originates from something I now believe to be an en_US.UTF-8 locale/glibc bug or problem which I can reproduce at system level.

        Given a simple file having two records with two TAB separated fields each, like
        a_2    2
        a2     1
        
        a command like 'sort -k 1 | cut -f 1 ' on that file displays
        a2
        a_2
        
        (as if the field delimiter wouldn't work and the second field determines the ordering), whereas on the same file _without_ the second field the sort order is reversed.

        This looks like a bug to me, tough I'm not sure yet and don't really know so far whom I should or could report this.

        Thanks!
        ---

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://866337]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (2)
As of 2024-04-20 06:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found