Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^6: perllocale weirdness, bug, or...?

by Krambambuli (Deacon)
on Oct 21, 2010 at 10:08 UTC ( #866522=note: print w/ replies, xml ) Need Help??


in reply to Re^5: perllocale weirdness, bug, or...?
in thread perllocale weirdness, bug, or...?

Thank you; indeed, Perl has no guilt.

As I was trying to track down a slightly more complicated issue, I just was on the wrong track for a while.

Looks like _my_ real problem originates from something I now believe to be an en_US.UTF-8 locale/glibc bug or problem which I can reproduce at system level.

Given a simple file having two records with two TAB separated fields each, like

a_2    2
a2     1
a command like 'sort -k 1 | cut -f 1 ' on that file displays
a2
a_2
(as if the field delimiter wouldn't work and the second field determines the ordering), whereas on the same file _without_ the second field the sort order is reversed.

This looks like a bug to me, tough I'm not sure yet and don't really know so far whom I should or could report this.

Thanks!
---


Comment on Re^6: perllocale weirdness, bug, or...?
Re^7: perllocale weirdness, bug, or...?
by ikegami (Pope) on Oct 21, 2010 at 16:01 UTC

    Maybe trailing numbers are special. Add one character to every item you want to sort, sort the items, then chop them.

    Update: I guess that only works if the character is the one with the lowest sort order, and which one that is isn't known.

Re^7: perllocale weirdness, bug, or...?
by choroba (Canon) on Oct 22, 2010 at 11:21 UTC
    You should use sort -k1,1 to make sort ignore the rest of the line.
      Thanks for the tip - I really hoped it might solve the issue, but unfortunately it does not.

      For the file test.txt (TAB separated fields) being
      a_2 2 a2 1
      I still get
      $ sort -k 1,1 test.txt | cut -f 1 a2 a_2
      whereas when there is no second field then the result is reversed,
      a_2 a2
      Looks like the problem 'survives'.
      ---
        Looks like two level sort: on the first level, it ignores the _, but if the strings are different only thanks to the underscore, it is used in the second level.
        $ echo $'a_2\t2\na2\t1' | sort -k1 a2 1 a_2 2 $ echo $'a_2\t2\na2\t1' | sort -k1,1 a2 1 a_2 2 $ echo $'a2\t2\na_2\t1' | sort -k1,1 a2 2 a_2 1 $ echo $'a2\t2\na_2\t1' | sort -k1 a_2 1 a2 2

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://866522]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2015-07-06 06:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (70 votes), past polls