Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Unicode puzzle

by Krambambuli (Curate)
on Apr 09, 2009 at 06:50 UTC ( #756524=perlquestion: print w/replies, xml ) Need Help??
Krambambuli has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perlmonks,

someone I know brought up an interesting question. Is there a (more or less) easy way to say about a given Unicode character in which language[s] it occurs?

If I'm not wrong, for '03FF' for example that would be ('Greek', 'Coptic').

I have no real idea so far how to tackle.



Replies are listed 'Best First'.
Re: Unicode puzzle
by ikegami (Pope) on Apr 09, 2009 at 06:59 UTC

    This probably won't help, but on the off chance that it gives you an idea what to search for or where to look...

    The script to which each character belongs has been tabulated as evidenced by the ability to do \p{Greek} in regexp matches. I just don't know if Perl exposes the reverse operation. The list of recognized scripts is in perlunicode and originates from "the Unicode database", files distributed by the Unicode Consortium.

Re: Unicode puzzle
by Krambambuli (Curate) on Apr 09, 2009 at 08:29 UTC
    Answering myself, after benefitting of Ikegami's suggestion about where to look.

    As almost always, CPAN has a solution and this time it is Unicode::UCD. So I can write

    $ perl -e 'use Unicode::UCD qw(charinfo); use Data::Dumper; print Dump +er( charinfo( ord("é"))), "\n";' $VAR1 = { 'digit' => '', 'bidi' => 'L', 'category' => 'Lu', 'code' => '00C3', 'script' => 'Latin', 'combining' => '0', 'upper' => '', 'name' => 'LATIN CAPITAL LETTER A WITH TILDE', 'unicode10' => 'LATIN CAPITAL LETTER A TILDE', 'decomposition' => '0041 0303', 'comment' => '', 'mirrored' => 'N', 'lower' => '00E3', 'numeric' => '', 'decimal' => '', 'title' => '', 'block' => 'Latin-1 Supplement' };
    et voilá.. :) Thank You, that's all I was looking for.

Re: Unicode puzzle
by afoken (Abbot) on Apr 09, 2009 at 07:28 UTC

    I doubt that you could get a perfect answer. That data is just not part of the Unicode description. Let me give you just one example. The letter é is commonly used in french, but it is not considered a "native inhabitant" of german language. But it is USED in german language for words like Café (meaning the location, not the drink). So what would be your answer to the "languages of é" question?


    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      You're right, of course - my initial question wasn't well formulated. Be forgiving - it was the best I was able to do at that moment :)



Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://756524]
Approved by ikegami
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2018-04-21 06:55 GMT
Find Nodes?
    Voting Booth?