Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^6: perllocale weirdness, bug, or...?

by Krambambuli (Deacon)
on Oct 21, 2010 at 10:08 UTC ( #866522=note: print w/ replies, xml ) Need Help??


in reply to Re^5: perllocale weirdness, bug, or...?
in thread perllocale weirdness, bug, or...?

Thank you; indeed, Perl has no guilt.

As I was trying to track down a slightly more complicated issue, I just was on the wrong track for a while.

Looks like _my_ real problem originates from something I now believe to be an en_US.UTF-8 locale/glibc bug or problem which I can reproduce at system level.

Given a simple file having two records with two TAB separated fields each, like

a_2    2
a2     1
a command like 'sort -k 1 | cut -f 1 ' on that file displays
a2
a_2
(as if the field delimiter wouldn't work and the second field determines the ordering), whereas on the same file _without_ the second field the sort order is reversed.

This looks like a bug to me, tough I'm not sure yet and don't really know so far whom I should or could report this.

Thanks!
---


Comment on Re^6: perllocale weirdness, bug, or...?
Re^7: perllocale weirdness, bug, or...?
by ikegami (Pope) on Oct 21, 2010 at 16:01 UTC

    Maybe trailing numbers are special. Add one character to every item you want to sort, sort the items, then chop them.

    Update: I guess that only works if the character is the one with the lowest sort order, and which one that is isn't known.

Re^7: perllocale weirdness, bug, or...?
by choroba (Abbot) on Oct 22, 2010 at 11:21 UTC
    You should use sort -k1,1 to make sort ignore the rest of the line.
      Thanks for the tip - I really hoped it might solve the issue, but unfortunately it does not.

      For the file test.txt (TAB separated fields) being
      a_2 2 a2 1
      I still get
      $ sort -k 1,1 test.txt | cut -f 1 a2 a_2
      whereas when there is no second field then the result is reversed,
      a_2 a2
      Looks like the problem 'survives'.
      ---
        Looks like two level sort: on the first level, it ignores the _, but if the strings are different only thanks to the underscore, it is used in the second level.
        $ echo $'a_2\t2\na2\t1' | sort -k1 a2 1 a_2 2 $ echo $'a_2\t2\na2\t1' | sort -k1,1 a2 1 a_2 2 $ echo $'a2\t2\na_2\t1' | sort -k1,1 a2 2 a_2 1 $ echo $'a2\t2\na_2\t1' | sort -k1 a_2 1 a2 2

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://866522]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (13)
As of 2014-07-22 20:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (127 votes), past polls