Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Re^5: perllocale weirdness, bug, or...?

by ikegami (Pope)
on Oct 20, 2010 at 21:43 UTC ( #866413=note: print w/replies, xml ) Need Help??

in reply to Re^4: perllocale weirdness, bug, or...?
in thread perllocale weirdness, bug, or...?

The problem is that I want to _reproduce_ under Perl the same order relation that the system level sort uses

What problem? You should have checked what order the system's sort uses.

$ cat >data aaa2000 aaa_2000 $ export LC_COLLATE=en_US.UTF-8 $ sort data aaa2000 aaa_2000 $ perl -le'use locale; chomp(@a=<>); print for sort @a;' data aaa2000 aaa_2000 $ export LC_COLLATE=C $ sort data aaa2000 aaa_2000 $ perl -le'use locale; chomp(@a=<>); print for sort @a;' data aaa2000 aaa_2000

Whether the order makes sense or not, it's doing exactly what you want.

Replies are listed 'Best First'.
Re^6: perllocale weirdness, bug, or...?
by Krambambuli (Curate) on Oct 21, 2010 at 10:08 UTC
    Thank you; indeed, Perl has no guilt.

    As I was trying to track down a slightly more complicated issue, I just was on the wrong track for a while.

    Looks like _my_ real problem originates from something I now believe to be an en_US.UTF-8 locale/glibc bug or problem which I can reproduce at system level.

    Given a simple file having two records with two TAB separated fields each, like
    a_2    2
    a2     1
    a command like 'sort -k 1 | cut -f 1 ' on that file displays
    (as if the field delimiter wouldn't work and the second field determines the ordering), whereas on the same file _without_ the second field the sort order is reversed.

    This looks like a bug to me, tough I'm not sure yet and don't really know so far whom I should or could report this.


      Maybe trailing numbers are special. Add one character to every item you want to sort, sort the items, then chop them.

      Update: I guess that only works if the character is the one with the lowest sort order, and which one that is isn't known.

      You should use sort -k1,1 to make sort ignore the rest of the line.
        Thanks for the tip - I really hoped it might solve the issue, but unfortunately it does not.

        For the file test.txt (TAB separated fields) being
        a_2 2 a2 1
        I still get
        $ sort -k 1,1 test.txt | cut -f 1 a2 a_2
        whereas when there is no second field then the result is reversed,
        a_2 a2
        Looks like the problem 'survives'.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://866413]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2020-10-01 07:08 GMT
Find Nodes?
    Voting Booth?
    If at first I donít succeed, I Ö

    Results (176 votes). Check out past polls.