Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^4: sorting Chinese characters

by larryk (Friar)
on Feb 12, 2013 at 05:31 UTC ( #1018298=note: print w/ replies, xml ) Need Help??


in reply to Re^3: sorting Chinese characters
in thread sorting Chinese characters

I have three indexes and only one is Chinese. If I needed a 'display sort' (i.e. I was going to show the index keys to the user) then I'd specialise my indexing class and use a native locale for the key type. However, in this case, the keys only need to be internally consistent between index creation (Perl) and consumption (C#). I was expecting that Unicode support between the two languages would be sufficiently mature that I could rely on the defined standard.

As it turns out, when I re-sorted the index in a debugging session in C# and then diffed the Perl index vs. the C# index, there were fewer differences than unreachable keys. A large block of mis-sorted entries were disrupting the binary search for proximate entries (including 'one') and then there were two or three Chinese characters which were sorting oddly between the two languages in a few places, which were causing the rest of the dead-ends.

By excluding these few entries during index creation, I now have 100% match.

If I get some time it'll be interesting to find out why those few characters are really being sorted in a different order. Might be a bug in either C# or Perl.

Anyway, thanks for your help.

   larryk                                          
perl -le "s,,reverse killer,e,y,rifle,lycra,,print"


Comment on Re^4: sorting Chinese characters

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1018298]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2015-07-03 03:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (48 votes), past polls