Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
I've been asked to help with a project involving some Lao script. I need to alphabetize lists of words in Lao. However, Lao characters are only barely defined in Perl, e.g. \p{InLao} to identify a Lao character, and I have been unable to find a predefined localedef or similar for Lao. Searching perlmonks revealed virtually nothing on localedef, and as it turns out, perl may use it, but it seems to come from a C library.

It appears a new Lao alphabet routine is needed. I may have to generate the rules for alphabetizing...here's the tough part: Lao is not a typical job for an alphabetic sort.

  1. Lao words are first sorted by consonant order.
  2. Vowels follow consonants in terms of alphabetical order, but not necessarily in terms of chronological order. For example, some vowels appear before the consonant even though they are pronounced after the consonant, and the alphabetical order follows pronunciation.
  3. After the typical list of single-character consonants, Lao has some "diphthong" consonants (double-character ones) which have their own alphabetical placements.
All of this adds up to a challenging puzzle for a perl enthusiast. I welcome your thoughts on how this could be done, and/or how it should be done in a way that would follow standard practice and be able to serve the entire Perl community for Lao script.

I have already developed a "Lao.pm" module (not yet submitted to CPAN, and may need to use a different namespace) that will identify Lao characters by consonant, vowel, punctuation, and tone marks, and will further classify the consonants by their Lao classes (high/mid/low). So I have the tools for distinguishing at the character level, e.g. \p{Lao::InLaoCons}\p{Lao::InLaoTone}\p{Lao::InLaoVowel}, but need to map the characters to an alphabetical order, and this part seems beyond my experience.

Blessings,

~Polyglot~


In reply to New Alphabet Sort Order by Polyglot

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others studying the Monastery: (6)
    As of 2014-08-20 11:19 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The best computer themed movie is:











      Results (111 votes), past polls