Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

How can I transliterate between languages?

( #18231=categorized question: print w/ replies, xml ) Need Help??
Contributed by Anonymous Monk on Jun 15, 2000 at 07:00 UTC
Q&A  > regular expressions


Description:

I am trying to write a transliterator for converting roman into another language. It is much easier to write using roman/keyboard but the output will show in the appropriate language font. For example:

dny = character #225
kh = character #35 k = character #12
h = character #10
a = character #8
kha = character #35#8 and not #12#10#8

What pattern should I write to split a string in those characters? For example khatos should split into kh, a, t,o,s So triplecharacter pattern should match first and then double character then single. It is guranteed that there will be always some vowels in between but they can be multcharcter always.

For example word mukharjee should split into m,u, kh,a, rj,ee. Is it possible get first character that is not a vowel, then get vowels then characters again?

Once I split it, all I need to do is to find associated charcter from assoc array and print.

Thanks for your help.

Answer: How can I transliterate between languages?
contributed by nardo

I didn't quite understand what you meant when you were talking about vowels, can vowels not be mapped directly using the dny = #225 style? anyways, here is some code which might do what you want, but depending on what you want to do with the vowels, it might not be exactly what you want.

%letters = ('dny' => '#225', 'kh' => '#35', 'k' => '#12', 'h' => '#10', 'a' => '#8', 't' => '#7', 'o' => '#6', 's' => '#5'); $string = 'khatos'; #if you ever want to match more than 3 characters #then change the 3 below to whatever number #you want. while($string =~ s/([a-zA-Z]{1,3})/&roman($1)/e) {} print "$string\n"; sub roman { my $letter = shift; my $remainder; while(length($letter) > 0) { if(exists($letters{$letter})) { return $letters{$letter}.$remainder; } $remainder = chop($letter).$remainder; } #if the character isn't found in %letters it #will return '#?', if you return an alphabetic #character which isn't in %letters then the #program will loop forever. return '#?'; }

Please (register and) log in if you wish to add an answer



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others meditating upon the Monastery: (13)
    As of 2014-10-31 22:15 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      For retirement, I am banking on:










      Results (225 votes), past polls