http://www.perlmonks.org?node_id=866341


in reply to Re^3: perllocale weirdness, bug, or...?
in thread perllocale weirdness, bug, or...?

The problem is that I want to _reproduce_ under Perl the same order relation that the system level sort uses, as I want to process under Perl a file sorted outside Perl with sort and I rely on the strict ordering in the file to be conform with what Perl uses.

Krambambuli
---
  • Comment on Re^4: perllocale weirdness, bug, or...?

Replies are listed 'Best First'.
Re^5: perllocale weirdness, bug, or...?
by ikegami (Patriarch) on Oct 20, 2010 at 21:43 UTC

    The problem is that I want to _reproduce_ under Perl the same order relation that the system level sort uses

    What problem? You should have checked what order the system's sort uses.

    $ cat >data aaa2000@yahoo.com aaa_2000@yahoo.com aaa2000 aaa_2000 $ export LC_COLLATE=en_US.UTF-8 $ sort data aaa2000 aaa_2000 aaa_2000@yahoo.com aaa2000@yahoo.com $ perl -le'use locale; chomp(@a=<>); print for sort @a;' data aaa2000 aaa_2000 aaa_2000@yahoo.com aaa2000@yahoo.com $ export LC_COLLATE=C $ sort data aaa2000 aaa2000@yahoo.com aaa_2000 aaa_2000@yahoo.com $ perl -le'use locale; chomp(@a=<>); print for sort @a;' data aaa2000 aaa2000@yahoo.com aaa_2000 aaa_2000@yahoo.com

    Whether the order makes sense or not, it's doing exactly what you want.

      Thank you; indeed, Perl has no guilt.

      As I was trying to track down a slightly more complicated issue, I just was on the wrong track for a while.

      Looks like _my_ real problem originates from something I now believe to be an en_US.UTF-8 locale/glibc bug or problem which I can reproduce at system level.

      Given a simple file having two records with two TAB separated fields each, like
      a_2    2
      a2     1
      
      a command like 'sort -k 1 | cut -f 1 ' on that file displays
      a2
      a_2
      
      (as if the field delimiter wouldn't work and the second field determines the ordering), whereas on the same file _without_ the second field the sort order is reversed.

      This looks like a bug to me, tough I'm not sure yet and don't really know so far whom I should or could report this.

      Thanks!
      ---

        Maybe trailing numbers are special. Add one character to every item you want to sort, sort the items, then chop them.

        Update: I guess that only works if the character is the one with the lowest sort order, and which one that is isn't known.

        You should use sort -k1,1 to make sort ignore the rest of the line.