Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^2: Unknown charnames when building Encode

by yulivee07 (Sexton)
on Jan 19, 2017 at 15:02 UTC ( [id://1179920]=note: print w/replies, xml ) Need Help??


in reply to Re: Unknown charnames when building Encode
in thread Unknown charnames when building Encode

Hi Ken, thank you very much for your suggestions. I updated my post with the perlcommands you suggested. I just noticed something interesting on the linux-machine wenn running perl -d t/Encode.t :
DB<8> x $uni 0 '\x{03B1}' DB<9> x is "\N{alpha}",substr($uni,0,1),"alpha does not map to symbol +'a'"; Unknown charname 'alpha' at (eval 23)[/usr/share/perl/5.18/perl5db.pl: +732] line 2, within string DB<10> x is "\N{greek:alpha}",substr($uni,0,1),"alpha does not map to +symbol 'a'"; ok 6 - alpha does not map to symbol 'a' 0 1
So apparently, \N{alpha} is also wrong on the linux machine, but somehow the testcase does pass as "ok" anyways. Strange.

Unfortunately, changing alpha to greek:alpha didn't help on the AIX machine. AIX doesn't know about greek:alpha either.

I guess for now, that the Unicode-Table of linux and AIX differ. I am trying to find a way to display all availiable aliases of a unicode point. Something like this:
$unicode_module->get_char_aliases('U+03B1'); GREEK SMALL LETTER ALPHA ALPHA alpha
Anyways, I will keep searching. Thanks for all your help so far.

Replies are listed 'Best First'.
Re^3: Unknown charnames when building Encode
by kcott (Archbishop) on Jan 20, 2017 at 15:08 UTC

    Firstly, it seems I gave you something of a bum steer regarding &Unicode::UCD::charprops_all. I've been using it for a while and forgot that it was a fairly recent function (in terms of Perl versions). It was added in v5.22.0, along with some other functions, so won't be available on any of the Perl versions you're using. Sorry about that. See perl5220delta: Updated Modules and Pragmata.

    [Just a quick note on markup. While it's generally preferable to use '<code>' tags for code and data, which you've been doing, this doesn't work too well with Unicode characters (outside the ASCII range). In these cases, '<pre>' tags work better: for instance, you'll see 'α' instead of '&#945;'. For inline text, such as in paragraphs, '<tt>' tags serve the same purpose.]

    As I have neither Ubuntu nor AIX, I can't effectively reproduce your results. However, I looked into this a bit further and have a few other suggestions.

    As you successfully printed the characters from the codepoints:

    $ perl -C -E 'say "\x{3b1} - \x{df} - \x{a3}"'
    α - ß - £
    

    [I didn't need it, but you may need to add -Mutf8 to get rid of the "Wide character" message you're seeing.]

    See what names you get for those characters:

    $ perl -Mcharnames=:full -E 'say charnames::viacode($_) for (0x3b1, 0x +df, 0xa3)' GREEK SMALL LETTER ALPHA LATIN SMALL LETTER SHARP S POUND SIGN

    In terms of what you're referring to as "aliases", I suspect there's a very large number of these. Have a look at charnames; in particular, read what it says about :full, :loose and :short. There's a link to the algorithm for :loose matching, but it's horribly broken: it should be "http://www.unicode.org/reports/tr44/#Matching_Names". How :short is determined, is explained on the charnames page.

    :full is fairly straighforward:

    $ perl -C -E 'say "\N{GREEK SMALL LETTER ALPHA}"'
    α
    

    Based on that #Matching_Names algorithm, I then tried:

    $ perl -C -E 'say "\N{greek small letter alpha}"' Unknown charname 'greek small letter alpha' at -e line 1, within strin +g Execution of -e aborted due to compilation errors.

    However, when I specified -Mcharnames=:loose, it worked:

    $ perl -Mcharnames=:loose -C -E 'say "\N{greek small letter alpha}"'
    α
    

    Bearing in mind the :loose algorithm, you can see there's a huge number of possibilities. Here's a few examples:

    $ perl -Mcharnames=:loose -C -E 'say "\N{GREEK_SMALL_LETTER_ALPHA}"'
    α
    $ perl -Mcharnames=:loose -C -E 'say "\N{GREEK-SMALL-LETTER-ALPHA}"'
    α
    $ perl -Mcharnames=:loose -C -E 'say "\N{GREEK-SMALL_LETTER-ALPHA}"'
    α
    $ perl -Mcharnames=:loose -C -E 'say "\N{greek_small_letter_alpha}"'
    α
    $ perl -Mcharnames=:loose -C -E 'say "\N{greek-small_letter-alpha}"'
    α
    $ perl -Mcharnames=:loose -C -E 'say "\N{greek small-letter alpha}"'
    α
    $ perl -Mcharnames=:loose -C -E 'say "\N{GrEeK SmAlL-LeTtEr aLpHa}"'
    α
    

    Now, as shown in my earlier post, I was able to use the :short forms directly:

    $ perl -C -E 'say "\N{greek:alpha}"'
    α
    $ perl -Mcharnames=greek -C -E 'say "\N{alpha}"'
    α
    

    They didn't work for you, but maybe these might:

    $ perl -Mcharnames=:short -C -E 'say "\N{greek:alpha}"'
    α
    $ perl -Mcharnames=:short,greek -C -E 'say "\N{alpha}"'
    α
    

    I also had a brief look at the source code for charnames.pm and _charnames.pm; although, I didn't delve into them too deeply. There's a lot of (non-POD) comments that may be of interest. Perhaps have a look at those for the versions you're using.

    — Ken

      Ken,

      thank you very much for your input. I found the source of my problems:

      Name.pl is missing in AIX:
      perl@t72:/usr/opt/perl5/lib/5.20.1/unicore $ ls Blocks.txt Decomposition.pl Name.pm SpecialCas +ing.txt UCD.pl CombiningClass.pl Heavy.pl NamedSequences.txt To + lib


      compare that to my linux distribution:
      perl@pod-racer:/usr/share/perl/5.18/unicore $ ls Blocks.txt Decomposition.pl lib Name.pl Spec +ialCasing.txt UCD.pl CombiningClass.pl Heavy.pl NamedSequences.txt Name.pm To + version


      As Name.pl is generated during the compilation of perl, the necessary program mktables was not to be found on the AIX System, as I does not seem to be delivered with a packaged perl.

      Solution to this: Download the perl-Version which is installed from CPAN into a temporary directory. On AIX 7.2 I have perl 5.20, so I download perl 5.20 from cpan (http://www.cpan.org/src/).

      Extract it, and cd to lib/unicore. There you can find the mktables program. Then generate the unicode files:

      chmod 755 mktables ./mktables


      This generates a bunch of files, including Name.pl. I diffed all newly generated files against the ones from my systems perl and found no difference, so I only copied Name.pl to my systems unicore-path. The Errors then vanished.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1179920]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2024-04-23 23:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found