http://www.perlmonks.org?node_id=1077065

lpwevers has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perl Monks,
I'm faced with an issue with Perl in which you can hopefully assist me. I have a Perl program that needs to run on both Sun Solaris and RedHat Enterprise Linux 6. One of the things it needs to do is sort a list. However, on Solaris and Linux the sort order is different when it comes to special characters. I tried using locale settings to avoid this issue, but to no avail. Below is an example script and the output on both Linux and Solaris.

#!/usr/bin/perl $ENV{LANG} = 'en_US'; $ENV{LC_ALL} = 'en_US.ISO8859-1'; use strict; use warnings; use locale; my $i; my @sortedList; my @toSort = ('SortTest', 'TestSort', 'Sort_Test', 'Test_Sort', 'Test1_Sort', 'Sort1_Test', 'Sort_1Test', 'Test_1Sort'); @sortedList = sort (@toSort); for ($i = 0; $i <= $#sortedList; $i++) { print "$i:\t$sortedList[$i]\n"; }
Running this gives the output below:
Linux Solaris 0: Sort_1Test 0: Sort_1Test 1: Sort1_Test 1: Sort_Test 2: SortTest 2: Sort1_Test 3: Sort_Test 3: SortTest 4: Test_1Sort 4: Test_1Sort 5: Test1_Sort 5: Test_Sort 6: TestSort 6: Test1_Sort 7: Test_Sort 7: TestSort

As can be seen, Linux prefers numbers over special characters, while on Solaris it's the other way round.

The question of course is, how can I make both Linux and Solaris behave equal?

Many thanks in advance.
Louis

Replies are listed 'Best First'.
Re: Difference in sort order between Solaris and Linux
by moritz (Cardinal) on Mar 05, 2014 at 12:18 UTC
Re: Difference in sort order between Solaris and Linux
by Anonymous Monk on Mar 05, 2014 at 11:42 UTC

    The question of course is, how can I make both Linux and Solaris behave equal?

    Um, fix your locales (the actual system files ... man locale..) so they match? See https://help.ubuntu.com/community/Locale and perllocale

    FWIW, on my win32 I get the same as your solaris

    0: Sort_1Test 1: Sort_Test 2: Sort1_Test 3: SortTest 4: Test_1Sort 5: Test_Sort 6: Test1_Sort 7: TestSort

    Which probably means I don't have that locale installed and/or the locale use locale; uses isn't the same as what you want it to be .... perllocale might have something to say on how to check what locales are available and which one is being used

Re: Difference in sort order between Solaris and Linux
by lpwevers (Acolyte) on Mar 05, 2014 at 13:24 UTC
    Hi,

    All, thanks for the suggestions. I've tried them all. So, I can now confirm that:

  • The locales are installed on both servers (locale -a lists them on both)
  • I've put the setting of the locale in a BEGIN { .. } block
  • Just to make sure, I've tried other, installed locales, as well.
  • If I remove &use locale& from the Perl code, I can see sort behaving differently. So far as I'm concerned, the locale setting seems to be taken into account
  • For completeness, on Solaris the locale is called &en_US.ISO8859-1&, whilst on Linux it's called &en_US.iso88591&. Just to make sure I tried both names on both systems.
  • All to no avail I'm afraid.

      Hi,

      To your point "...If I remove &use locale& from the Perl code, I can see sort behaving differently...". Have you tried to call your script on both environments this way:

      LANG=C perl sortscript.pl

      What is the outcome in this case?

      McA

        Hi, Ok, just tried that. There's no difference in the outcome. To make sure I also double checked that the locale settings in the shell are set to the right iso8859 value.
Re: Difference in sort order between Solaris and Linux
by Laurent_R (Canon) on Mar 05, 2014 at 22:30 UTC
    Hi, my understanding is that you don't really care how special characters are sorted out, provided they get sorted the same way on your two platforms. Based on this assumption, and continuing on my idea of preprocessing your data, I think that the following program (using option 1 of my previous post) should give you the same results on both platforms:
    #!/usr/bin/perl use strict; use warnings; my @toSort = ('SortTest', 'TestSort', 'Sort_Test', 'Test_Sort', 'Test1_Sort', 'Sort1_Test', 'Sort_1Test', 'Test_1Sort'); my @sortedList = map {$_->[0]} sort {$a->[1] cmp $b->[1]} map { my $c= $_; $c =~ s/_//g; [$_, $c] } @toSort; print join "\n", @sortedList;
    I believe that this program should give you the same following output on both platforms (but, of course, I can't test, it is up to you to try):
    $ perl sort_files.pl Sort1_Test Sort_1Test SortTest Sort_Test Test1_Sort Test_1Sort TestSort Test_Sort
    This program uses the principle of the Schwartzian Transform I mentioned in my earlier post. The main 3-line sorting command should be read from bottom to top to be understood. Basically, the map at the bottom transforms each element of the original array into a record (anonymous array) of 2 elements: the original value and a version of it without the '_' (something like ['Test_Sort', 'TestSort']). Then, the sort line sorts the records on the second element of each record (i.e. the filename without the _), and, finally, the top map extracts the original value from the sorted records. This will probably be very slightly slower than your sort program, but only by a small margin, you are unlikely to even notice the difference unless you input has dozens or hundreds of megabytes. please ask if you need further explanations.
      Hi Laurent

      Thanks for the explanation. I've tested this and found that indeed the output is the same on both Solaris and Linux. So instead of using the Perl sort function I'll use the Schwartzian Transform as you suggested.

      Kind Regards,
      Louis
Re: Difference in sort order between Solaris and Linux
by karlgoethebier (Abbot) on Mar 05, 2014 at 13:25 UTC

    I agree with moritz.

    Perhaps similar as with DBD::Sybase:

    BEGIN { $ENV{LANG} = q(C); $ENV{SYBASE} = q(/opt/sybase); }

    Doesn't work without a BEGIN block too.

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

Re: Difference in sort order between Solaris and Linux
by choroba (Cardinal) on Mar 05, 2014 at 11:42 UTC
    I am getting a different result on RedHat 6.4, Perl 5.10.1:
    0: Sort1_Test 1: SortTest 2: Sort_1Test 3: Sort_Test 4: Test1_Sort 5: TestSort 6: Test_1Sort 7: Test_Sort
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Difference in sort order between Solaris and Linux
by VincentK (Beadle) on Mar 05, 2014 at 18:54 UTC
Re: Difference in sort order between Solaris and Linux
by Laurent_R (Canon) on Mar 05, 2014 at 18:21 UTC
    If nothing else works, you could perhaps do one of the following two things: (1) preprocess the data to change the characters that create a problem into something else (and post process the data back the other way around afterwards); the Schwartzian Transform (just "google" these words if you don't know what it is) might be a way to do it; (2) write your custom compare subroutine to replace the default cmp function. Depending on your real data one or the other solution might be practical or almost impossible; it is also likely to be slower, but if your data is not too large, this might not be a problem.
Re: Difference in sort order between Solaris and Linux
by RMGir (Prior) on Mar 06, 2014 at 14:29 UTC
    Just to add to the fun, on an AIX box I get something different from both Linux AND Sun for

    LC_ALL="en_US.ISO8859-1" sort ~/tmp/sortdata:

    Sort1_Test
    SortTest
    Sort_1Test
    Sort_Test
    Test1_Sort
    TestSort
    Test_1Sort
    Test_Sort
    
    I checked, and the locale is valid on the AIX host as well.

    Mike