Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Does 'use locale' Change What Is Considered To Be 'alphabetical order'?

by Old_Gray_Bear (Bishop)
on Aug 07, 2010 at 23:40 UTC ( [id://853605]=perlquestion: print w/replies, xml ) Need Help??

Old_Gray_Bear has asked for the wisdom of the Perl Monks concerning the following question:

I have been going cross-eyed over this for the last hour or so. Each time I re-read the Doc on use locale; I come up with a different opinion of how this code snippit should work.
$x = "\x{9e}"; # A-acute accent { use local; ($x =~ /a..zA..Z/) ? print "true\n" : print "not true\n"; }
The code I am debugging takes a list of words and sorts them "alphabetically". Most of the time it works as expected, but now we are starting to get a few French-Canadian words in the corpus and "A accent-egu"(I thinks that's right, I am not a Francophone) falls at the end of the 'normal' alphabet (following Z), rather than sorting between 'A' and 'B' where I am told it should.

I initially looked at the code and said "/self, it's a locale issue" and started reading the Doc on the 'locale' pragma. Three cups of coffee later, I still do understand why I'm not getting the 'correct' sort. I am obviously missing something basic here, but what?

----
I Go Back to Sleep, Now.

OGB

Replies are listed 'Best First'.
Re: Does 'use locale' Change What Is Considered To Be 'alphabetical order'?
by ikegami (Patriarch) on Aug 08, 2010 at 00:30 UTC

    Yes.

    >perl -le"use open ':std', ':encoding(cp850)'; use if $ARGV[0], 'local +e'; print sort 'a', chr(0xE1), 'b'" 0 abá >perl -le"use open ':std', ':encoding(cp850)'; use if $ARGV[0], 'local +e'; print sort 'a', chr(0xE1), 'b'" 1 aáb

    (Replace cp850 with the proper encoding for your console.)

    You have a few problems:

    • "á" (English: "a with acute accent", French: "a avec accent aigu") is Unicode character E1, not 9E.

    • You wrote use local; instead of use locale;

    • /a..zA..Z/ means

      1. "a"
      2. followed by a character other than "\n"
      3. followed by a character other than "\n"
      4. followed by "z"
      5. followed by "A"
      6. followed by a character other than "\n"
      7. followed by a character other than "\n"
      8. followed by "Z"

      You meant /[a-zA-Z]/

    • Perl ranges and character class ranges don't use alphabetical order.

    • French doesn't use "á".

    The solution is to use POSIX or Unicode properties.

    >perl -le"my $s = chr(0xE1); print $s =~ /\p{Alpha}/ ?1:0" 1 >perl -le"my $s = chr(0xE1); print $s =~ /[\p{Alpha}]/ ?1:0" 1 >perl -le"my $s = chr(0xE1); utf8::upgrade($s); print $s =~ /[[:alpha: +]]/ ?1:0" 1

    From the documentation, it seems to me the following should also work, but they don't:

    >perl -le"use feature 'unicode_strings'; my $s = chr(0xE1); print $s = +~ /[[:alpha:]]/ ?1:0" 0 >perl -le"use 5.012; my $s = chr(0xE1); print $s =~ /[[:alpha:]]/ ?1:0 +" 0

    Update: Fixed c&p mistake in char class.
    Update: Inserted Problem #1.
    Update: Oops, I seem to have forgotten to include the solution. Added.

      Yup, local for locale is a Typo.

      Bingo! 'encoding' is the piece I was missing. Thank you, ikigami. Now I'm off to re-factor and test the read-extract-and-build-the-list routines. I'll be back for air by mid-night....

      ----
      I Go Back to Sleep, Now.

      OGB

        That just causes STDIN, STDOUT and STDERR to properly decode and encode characters. It's got nothing to do with the problem at hand. I added a solution to my post as you posted this reply.
Re: Does 'use locale' Change What Is Considered To Be 'alphabetical order'?
by jethro (Monsignor) on Aug 08, 2010 at 00:12 UTC
    Is 'use local' instead of 'use locale' in your script example just a copy mistake?
Re: Does 'use locale' Change What Is Considered To Be 'alphabetical order'?
by FunkyMonk (Chancellor) on Aug 08, 2010 at 00:19 UTC
    /a..zA..Z/
    Shouldn't that be a character class?

    Disclaimer: I'm strictly 7-bit here :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://853605]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (9)
As of 2024-04-19 09:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found