Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^5: perllocale weirdness, bug, or...?

by ikegami (Pope)
on Oct 20, 2010 at 21:43 UTC ( #866413=note: print w/ replies, xml ) Need Help??


in reply to Re^4: perllocale weirdness, bug, or...?
in thread perllocale weirdness, bug, or...?

The problem is that I want to _reproduce_ under Perl the same order relation that the system level sort uses

What problem? You should have checked what order the system's sort uses.

$ cat >data aaa2000@yahoo.com aaa_2000@yahoo.com aaa2000 aaa_2000 $ export LC_COLLATE=en_US.UTF-8 $ sort data aaa2000 aaa_2000 aaa_2000@yahoo.com aaa2000@yahoo.com $ perl -le'use locale; chomp(@a=<>); print for sort @a;' data aaa2000 aaa_2000 aaa_2000@yahoo.com aaa2000@yahoo.com $ export LC_COLLATE=C $ sort data aaa2000 aaa2000@yahoo.com aaa_2000 aaa_2000@yahoo.com $ perl -le'use locale; chomp(@a=<>); print for sort @a;' data aaa2000 aaa2000@yahoo.com aaa_2000 aaa_2000@yahoo.com

Whether the order makes sense or not, it's doing exactly what you want.


Comment on Re^5: perllocale weirdness, bug, or...?
Download Code
Re^6: perllocale weirdness, bug, or...?
by Krambambuli (Deacon) on Oct 21, 2010 at 10:08 UTC
    Thank you; indeed, Perl has no guilt.

    As I was trying to track down a slightly more complicated issue, I just was on the wrong track for a while.

    Looks like _my_ real problem originates from something I now believe to be an en_US.UTF-8 locale/glibc bug or problem which I can reproduce at system level.

    Given a simple file having two records with two TAB separated fields each, like
    a_2    2
    a2     1
    
    a command like 'sort -k 1 | cut -f 1 ' on that file displays
    a2
    a_2
    
    (as if the field delimiter wouldn't work and the second field determines the ordering), whereas on the same file _without_ the second field the sort order is reversed.

    This looks like a bug to me, tough I'm not sure yet and don't really know so far whom I should or could report this.

    Thanks!
    ---

      Maybe trailing numbers are special. Add one character to every item you want to sort, sort the items, then chop them.

      Update: I guess that only works if the character is the one with the lowest sort order, and which one that is isn't known.

      You should use sort -k1,1 to make sort ignore the rest of the line.
        Thanks for the tip - I really hoped it might solve the issue, but unfortunately it does not.

        For the file test.txt (TAB separated fields) being
        a_2 2 a2 1
        I still get
        $ sort -k 1,1 test.txt | cut -f 1 a2 a_2
        whereas when there is no second field then the result is reversed,
        a_2 a2
        Looks like the problem 'survives'.
        ---

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://866413]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2015-07-06 23:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (85 votes), past polls