http://www.perlmonks.org?node_id=18231

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to write a transliterator for converting roman into another language. It is much easier to write using roman/keyboard but the output will show in the appropriate language font. For example:

dny = character #225
kh = character #35 k = character #12
h = character #10
a = character #8
kha = character #35#8 and not #12#10#8

What pattern should I write to split a string in those characters? For example khatos should split into kh, a, t,o,s So triplecharacter pattern should match first and then double character then single. It is guranteed that there will be always some vowels in between but they can be multcharcter always.

For example word mukharjee should split into m,u, kh,a, rj,ee. Is it possible get first character that is not a vowel, then get vowels then characters again?

Once I split it, all I need to do is to find associated charcter from assoc array and print.

Thanks for your help.

Originally posted as a Categorized Question.

  • Comment on How can I transliterate between languages?

Replies are listed 'Best First'.
Re: How can I transliterate between languages?
by nardo (Friar) on Jun 15, 2000 at 19:11 UTC
    I didn't quite understand what you meant when you were talking about vowels, can vowels not be mapped directly using the dny = #225 style? anyways, here is some code which might do what you want, but depending on what you want to do with the vowels, it might not be exactly what you want.

    %letters = ('dny' => '#225', 'kh' => '#35', 'k' => '#12', 'h' => '#10', 'a' => '#8', 't' => '#7', 'o' => '#6', 's' => '#5'); $string = 'khatos'; #if you ever want to match more than 3 characters #then change the 3 below to whatever number #you want. while($string =~ s/([a-zA-Z]{1,3})/&roman($1)/e) {} print "$string\n"; sub roman { my $letter = shift; my $remainder; while(length($letter) > 0) { if(exists($letters{$letter})) { return $letters{$letter}.$remainder; } $remainder = chop($letter).$remainder; } #if the character isn't found in %letters it #will return '#?', if you return an alphabetic #character which isn't in %letters then the #program will loop forever. return '#?'; }