Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
Hi! It has been some time since my last post but I haven't forgotten about you guys :)

Bit of an oddball query, this one... but I bet someone knows one or two ways to do it.

I'm creating search indexes with Perl for my Chinese dictionary mobile app and am hitting a problem where the (Perl) sorted keys - Chinese characters - are not in exactly the same order as the binary search on-device is expecting them to be.

In real terms, there are 165910 index records and 188 of them are unreachable on-device because the sort order is slightly different between Perl's standard string sort and C#'s String.Compare function (with "en-US" culture).

I've played around for weeks with the culture settings and this 188 unreachable number is the optimal result I have achieved. So 2 questions:

1. (long shot) has anyone seen this issue before so knows the magic incantation to get Perl and C# to agree 100% on sort order for UTF-8?

2. (failing that) how do I get Perl to sort by Unicode code point (i.e. the raw underlying \u{xxxx} value, because I can probably force C# to do it that way as an exception for this index?

Any help much appreciated.

   larryk                                          
perl -le "s,,reverse killer,e,y,rifle,lycra,,print"

In reply to sorting Chinese characters by larryk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others rifling through the Monastery: (14)
    As of 2015-07-07 22:04 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









      Results (93 votes), past polls