Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Unicode Transliteration

by philkime (Beadle)
on Mar 24, 2016 at 19:16 UTC ( #1158742=perlquestion: print w/replies, xml ) Need Help??

philkime has asked for the wisdom of the Perl Monks concerning the following question:

I am looking for a good transliteration module able to cope with a range of scripts. There is Unicode ::Transliterate which is old but still seems to compile and work with current ICU libraries. However, it's said to be alpha quality and has a lot of compiler warnings. Lingua::Translit doesn't have very many scripts (no Indic) but is extensible. I tried to write a mapping for Latin<->Devanagari but it's all data rule driven which makes defining new mappings a pain - no coding, just XML rules with no control over NFC/NFD etc. Lingua::Deva seems to work but it's just for Devanagari and I'd prefer something more general. So, does anyone know what happened to PICU - the "wrapper for ICU"? ICU is the way to go but as far as I know, there has never been a decent perl wrapper for this. I heard rumours that perl6 would use ICU internally but that was, like a lot of perl6 news, years and years ago ...

Replies are listed 'Best First'.
Re: Unicode Transliteration
by Corion (Pope) on Mar 24, 2016 at 19:34 UTC

    I don't know if you're tied to ICU, but I've had good experience with Text::Unidecode, which turns Unicode strings (back to) Roman text data.

      Thanks for the recommendation but I should have said that I need conversion between strict standards-based scripts like IAST and Devenagari and that module just (quite well apparently) lets you do a helpful ASCII transliteration. Specifically, I need to be able to do things like IAST Sanskrit -> Devanagari as this is the way to collate such languages.
Re: Unicode Transliteration
by zwon (Abbot) on Mar 24, 2016 at 22:23 UTC
    However, it's said to be alpha quality and has a lot of compiler warnings.
    But does it work for you? BTW, I see only three deprecation warnings when build it on Ubuntu with libicu52
      I rebuilt and fixed the warnings and it does now appear to build cleanly - a real tribute to the backwards compat of ICU ... I have to wait until my Sanskrit source can verify if the transliteration looks ok ...
        Apparently not. It seems that ICU doesn't support IAST, only the more general and different ISO15919. Ah well, perhaps I will have to fight with Lingua::Translit.
Re: Unicode Transliteration
by captainjames (Novice) on Dec 23, 2018 at 05:42 UTC
    > So, does anyone know what happened to PICU - the "wrapper for ICU"?

    I was the co-author of PICU back in 2002. The source is still online, but I don't believe you will be able to build it 2 decades later without significant effort.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1158742]
Approved by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (2)
As of 2021-10-20 19:10 GMT
Find Nodes?
    Voting Booth?
    My first memorable Perl project was:

    Results (81 votes). Check out past polls.