Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Unknown charnames when building Encode

by yulivee07 (Sexton)
on Jan 19, 2017 at 08:23 UTC ( [id://1179897]=perlquestion: print w/replies, xml ) Need Help??

yulivee07 has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow Perlmonks, I am trying to build Encode 2.88-3 from CPAN on AIX 7.2. During the make test phase I receive various errors about unknown characters:
Use of uninitialized value $txt in pattern match (m//) at /usr/opt/per +l5/lib/5.20.1/_charnames.pm line 499. Unknown charname 'alpha' at t/Encode.t line 44, within string BEGIN not safe after errors--compilation aborted at t/Encode.t line 14 +8. t/Encode.t ................. t/encoding-locale.t ........ ok Use of uninitialized value $txt in pattern match (m//) at /usr/opt/per +l5/lib/5.20.1/_charnames.pm line 459. Unknown charname 'LATIN SMALL LETTER SHARP S' at t/encoding.t line 77, + within string BEGIN not safe after errors--compilation aborted at t/encoding.t line +165. Use of uninitialized value $txt in pattern match (m//) at /usr/opt/per +l5/lib/5.20.1/_charnames.pm line 459. Unknown charname 'POUND SIGN' at t/mime-header.t line 166, within stri +ng Execution of t/mime-header.t aborted due to compilation errors. # Looks like your test exited with 2 just after 1. t/mime-header.t ............
To test whether this is an AIX problem or a perl problem, I tried to build the same version on my linux system, where Encode installs just fine.
To pick the first error:
is "\N{alpha}",substr($uni,0,1),"alpha does not map to symbol 'a'";
It seems AIX perl is unable to find the \N{alpha} character. I am a bit lost here - where does perl usually search for characters like this?
I need a hint into the direction I have to search for with this problem. Can someone provide some debugging tips?
Updates:
perl -E 'use charnames (); say $charnames::VERSION' 1.40 perl -C -E 'say "\x{3b1} - \x{df} - \x{a3}"' Wide character in say at -e line 1. α - ß - £ perl -E 'use Unicode::UCD; say $Unicode::UCD::VERSION' 0.58 perl -MUnicode::UCD=charprops_all -E 'say charprops_all("U+$_")->{Age} + for qw{3b1 df a3}' "charprops_all" is not exported by the Unicode::UCD module Can't continue after import errors at -e line 0. perl -C -E 'say "\N{greek:alpha}"' Use of uninitialized value $txt in pattern match (m//) at /usr/opt/per +l5/lib/5.20.1/_charnames.pm line 459. Use of uninitialized value $txt in pattern match (m//) at /usr/opt/per +l5/lib/5.20.1/_charnames.pm line 499. Unknown charname 'greek:alpha' at -e line 1, within string Execution of -e aborted due to compilation errors.
The build process is using CPAN (perl -MCPAN -eshell) to install modules. We use local::lib to install to a specific directory, rather than the system perl path. The perl we are using is the one coming with AIX 7.2, so we did not build perl ourself.
perl -V Summary of my perl5 (revision 5 version 20 subversion 1) configuration +: Platform: osname=aix, osvers=6.1.0.0, archname=aix-thread-multi uname='aix blade08 1 6 00003c3ad100 ' config_args='-d -Dprefix=/usr/opt/perl5 -Dcc=xlc_r -Duseshrplib -D +usethreads' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='xlc_r -q32', ccflags ='-D_ALL_SOURCE -D_ANSI_C_SOURCE -D_POSIX +_SOURCE -qmaxmem=-1 -qnoansialias -qlanglvl=extc99 -DUSE_NATIVE_DLOPE +N -DNEED_PTHREAD_INIT -q32 -D_LARGE_FILES', optimize='-O', cppflags='-D_ALL_SOURCE -D_ANSI_C_SOURCE -D_POSIX_SOURCE -qmaxmem= +-1 -qnoansialias -qlanglvl=extc99 -DUSE_NATIVE_DLOPEN -DNEED_PTHREAD_ +INIT' ccversion='12.1.0.9', gccversion='', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=8 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', + lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='ld', ldflags =' -brtl -bdynamic -b32' libpth=/lib /usr/lib /usr/ccs/lib libs=-lbind -lnsl -ldbm -ldl -lld -lm -lcrypt -lpthreads -lc perllibs=-lbind -lnsl -ldl -lld -lm -lcrypt -lpthreads -lc libc=, so=a, useshrplib=true, libperl=libperl.a gnulibc_version='' Dynamic Linking: dlsrc=dl_aix.xs, dlext=so, d_dlsymun=undef, ccdlflags=' -bE:/usr/ +opt/perl5/lib/5.20.1/aix-thread-multi/CORE/perl.exp' cccdlflags=' ', lddlflags='-bhalt:4 -G -bI:$(PERL_INC)/perl.exp -b +E:$(BASEEXT).exp -bnoentry -lpthreads -lc -lm' Characteristics of this binary (from libperl): Compile-time options: HAS_TIMES MULTIPLICITY PERLIO_LAYERS PERL_DONT_CREATE_GVSV PERL_HASH_FUNC_ONE_AT_A_TIME_HARD PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP PERL_NEW_COPY_ON_WRITE PERL_PRESERVE_IVUV USE_ITHREADS USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF USE_REENTRANT_API Built under aix Compiled at Feb 6 2015 14:54:29 %ENV: PERL5LIB="/home/perl_ss/perl5/lib/perl5/aix-thread-multi:/home/per +l_ss/perl5/lib/perl5:/usr/local/lib/site_perl/5.8.8:/usr/local/site_p +erl/common" PERL5OPT="" PERL5_CPANPLUS_IS_RUNNING="9961732" PERL5_CPAN_IS_RUNNING="9961732" PERL_LOCAL_LIB_ROOT="/home/perl_ss/perl5" PERL_MB_OPT="--install_base /home/perl_ss/perl5" PERL_MM_OPT="INSTALL_BASE=/home/perl_ss/perl5" @INC: /home/perl_ss/perl5/lib/perl5/aix-thread-multi /home/perl_ss/perl5/lib/perl5/aix-thread-multi /home/perl_ss/perl5/lib/perl5 /usr/local/lib/site_perl/5.8.8/aix-thread-multi /usr/local/lib/site_perl/5.8.8 /usr/local/site_perl/common /usr/opt/perl5/lib/site_perl/5.20.1/aix-thread-multi /usr/opt/perl5/lib/site_perl/5.20.1 /usr/opt/perl5/lib/5.20.1/aix-thread-multi /usr/opt/perl5/lib/5.20.1 /usr/opt/perl5/lib/site_perl/5.8.8 /usr/opt/perl5/lib/site_perl

Replies are listed 'Best First'.
Re: Unknown charnames when building Encode
by kcott (Archbishop) on Jan 19, 2017 at 13:20 UTC

    G'day yulivee07,

    My Perl (Mac OS X):

    $ perl -v | head -2 | tail -1 This is perl 5, version 24, subversion 0 (v5.24.0) built for darwin-th +read-multi-2level

    Both 'LATIN SMALL LETTER SHARP S' and 'POUND SIGN' are valid and work for me:

    $ perl -C -E 'say "\N{LATIN SMALL LETTER SHARP S}"'
    ß
    $ perl -C -E 'say "\N{POUND SIGN}"'
    £
    
    "It seems AIX perl is unable to find the \N{alpha} character. I am a bit lost here - where does perl usually search for characters like this?"

    Without additional code, 'alpha' doesn't work by itself. I get the same error as you:

    $ perl -C -E 'say "\N{alpha}"' Unknown charname 'alpha' at -e line 1, within string Execution of -e aborted due to compilation errors.

    You can use 'greek:alpha':

    $ perl -C -E 'say "\N{greek:alpha}"'
    α
    

    I note from the source of t/Encode.t:

    use charnames qw(greek);

    That also works for me (see charnames):

    $ perl -Mcharnames=greek -C -E 'say "\N{alpha}"'
    α
    
    "Can someone provide some debugging tips? "

    When running any tests, bear in mind that different versions of Perl support different levels of Unicode:

    I note from your "perl -V" output, that @INC contains some 5.8.8 paths before 5.20.1 paths. It's possible that old versions of Unicode-related modules are being found first.

    Take a look in the code (*.pm, *.t, etc.) shown in your error messages for modules being used, then check which versions you have. For instance, I have

    $ perl -E 'use charnames (); say $charnames::VERSION' 1.43

    For Perl 5.20.1, that should be 1.40; for 5.8.8 it should be 1.05. You can go to the latest distribution (http://search.cpan.org/~shay/perl-5.24.1/); use the dropdown list to select 5.20.1, 5.8.8, or any other version; then follow the link to the wanted module (e.g. charnames).

    You can test whether your system recognises any named characters by using their codepoints:

    $ perl -C -E 'say "\x{3b1} - \x{df} - \x{a3}"'
    α - ß - £
    

    Unicode::UCD might provide some useful information. For instance, to check which version of Unicode that a character first appeared in:

    $ perl -MUnicode::UCD=charprops_all -E 'say charprops_all("U+$_")->{Ag +e} for qw{3b1 df a3}' V1_1 V1_1 V1_1

    For any of the tests above, you could try manipulating @INC first, e.g. move the 5.8.8. paths to the end of the list.

    Also, it could be helpful to know exactly what your build process is: manual, cpan, etc.

    See also: any links in http://perldoc.perl.org/perl.html with descriptions matching /Unicode/; http://www.unicode.org/.

    — Ken

      Hi Ken, thank you very much for your suggestions. I updated my post with the perlcommands you suggested. I just noticed something interesting on the linux-machine wenn running perl -d t/Encode.t :
      DB<8> x $uni 0 '\x{03B1}' DB<9> x is "\N{alpha}",substr($uni,0,1),"alpha does not map to symbol +'a'"; Unknown charname 'alpha' at (eval 23)[/usr/share/perl/5.18/perl5db.pl: +732] line 2, within string DB<10> x is "\N{greek:alpha}",substr($uni,0,1),"alpha does not map to +symbol 'a'"; ok 6 - alpha does not map to symbol 'a' 0 1
      So apparently, \N{alpha} is also wrong on the linux machine, but somehow the testcase does pass as "ok" anyways. Strange.

      Unfortunately, changing alpha to greek:alpha didn't help on the AIX machine. AIX doesn't know about greek:alpha either.

      I guess for now, that the Unicode-Table of linux and AIX differ. I am trying to find a way to display all availiable aliases of a unicode point. Something like this:
      $unicode_module->get_char_aliases('U+03B1'); GREEK SMALL LETTER ALPHA ALPHA alpha
      Anyways, I will keep searching. Thanks for all your help so far.

        Firstly, it seems I gave you something of a bum steer regarding &Unicode::UCD::charprops_all. I've been using it for a while and forgot that it was a fairly recent function (in terms of Perl versions). It was added in v5.22.0, along with some other functions, so won't be available on any of the Perl versions you're using. Sorry about that. See perl5220delta: Updated Modules and Pragmata.

        [Just a quick note on markup. While it's generally preferable to use '<code>' tags for code and data, which you've been doing, this doesn't work too well with Unicode characters (outside the ASCII range). In these cases, '<pre>' tags work better: for instance, you'll see 'α' instead of '&#945;'. For inline text, such as in paragraphs, '<tt>' tags serve the same purpose.]

        As I have neither Ubuntu nor AIX, I can't effectively reproduce your results. However, I looked into this a bit further and have a few other suggestions.

        As you successfully printed the characters from the codepoints:

        $ perl -C -E 'say "\x{3b1} - \x{df} - \x{a3}"'
        α - ß - £
        

        [I didn't need it, but you may need to add -Mutf8 to get rid of the "Wide character" message you're seeing.]

        See what names you get for those characters:

        $ perl -Mcharnames=:full -E 'say charnames::viacode($_) for (0x3b1, 0x +df, 0xa3)' GREEK SMALL LETTER ALPHA LATIN SMALL LETTER SHARP S POUND SIGN

        In terms of what you're referring to as "aliases", I suspect there's a very large number of these. Have a look at charnames; in particular, read what it says about :full, :loose and :short. There's a link to the algorithm for :loose matching, but it's horribly broken: it should be "http://www.unicode.org/reports/tr44/#Matching_Names". How :short is determined, is explained on the charnames page.

        :full is fairly straighforward:

        $ perl -C -E 'say "\N{GREEK SMALL LETTER ALPHA}"'
        α
        

        Based on that #Matching_Names algorithm, I then tried:

        $ perl -C -E 'say "\N{greek small letter alpha}"' Unknown charname 'greek small letter alpha' at -e line 1, within strin +g Execution of -e aborted due to compilation errors.

        However, when I specified -Mcharnames=:loose, it worked:

        $ perl -Mcharnames=:loose -C -E 'say "\N{greek small letter alpha}"'
        α
        

        Bearing in mind the :loose algorithm, you can see there's a huge number of possibilities. Here's a few examples:

        $ perl -Mcharnames=:loose -C -E 'say "\N{GREEK_SMALL_LETTER_ALPHA}"'
        α
        $ perl -Mcharnames=:loose -C -E 'say "\N{GREEK-SMALL-LETTER-ALPHA}"'
        α
        $ perl -Mcharnames=:loose -C -E 'say "\N{GREEK-SMALL_LETTER-ALPHA}"'
        α
        $ perl -Mcharnames=:loose -C -E 'say "\N{greek_small_letter_alpha}"'
        α
        $ perl -Mcharnames=:loose -C -E 'say "\N{greek-small_letter-alpha}"'
        α
        $ perl -Mcharnames=:loose -C -E 'say "\N{greek small-letter alpha}"'
        α
        $ perl -Mcharnames=:loose -C -E 'say "\N{GrEeK SmAlL-LeTtEr aLpHa}"'
        α
        

        Now, as shown in my earlier post, I was able to use the :short forms directly:

        $ perl -C -E 'say "\N{greek:alpha}"'
        α
        $ perl -Mcharnames=greek -C -E 'say "\N{alpha}"'
        α
        

        They didn't work for you, but maybe these might:

        $ perl -Mcharnames=:short -C -E 'say "\N{greek:alpha}"'
        α
        $ perl -Mcharnames=:short,greek -C -E 'say "\N{alpha}"'
        α
        

        I also had a brief look at the source code for charnames.pm and _charnames.pm; although, I didn't delve into them too deeply. There's a lot of (non-POD) comments that may be of interest. Perhaps have a look at those for the versions you're using.

        — Ken

Re: Unknown charnames when building Encode
by Anonymous Monk on Jan 19, 2017 at 10:18 UTC
      Further inspection of the problem leads me to the conclusion that this may not be related to encode. I fired up my Debugger and performed some tests. Notice how the Character \N{greek:alpha} was found on Linux, but not on AIX:
      AIX 7.2:
      DB<1> use charnames qw(greek) DB<2> print "\N{alpha}" Use of uninitialized value $txt in pattern match (m//) at /usr/opt/per +l5/lib/5.20.1/_charnames.pm line 459. [...] Unknown charname 'alpha' at (eval 9)[/usr/opt/perl5/lib/5.20.1/perl5db +.pl:732] line 2, within string DB<3> print "\N{U+03B1}" Wide character in print at (eval 10)[/usr/opt/perl5/lib/5.20.1/perl5db +.pl:732] line 2. [...] &#945; DB<4> print "\N{greek:alpha}" Use of uninitialized value $txt in pattern match (m//) at /usr/opt/per +l5/lib/5.20.1/_charnames.pm line 459. [...] Unknown charname 'greek:alpha' at (eval 11)[/usr/opt/perl5/lib/5.20.1/ +perl5db.pl:732] line 2, within string
      Ubuntu 14.04
      DB<1> use charnames qw(greek) DB<2> print "\N{alpha}" Unknown charname 'alpha' at (eval 8)[/usr/share/perl/5.18/perl5db.pl: +732] line 2, within string DB<3> print "\N{U+03B1}" Wide character in print at (eval 9)[/usr/share/perl/5.18/perl5db.pl:7 +32] line 2. &#945; DB<4> print "\N{greek:alpha}" Wide character in print at (eval 10)[/usr/share/perl/5.18/perl5db.pl: +732] line 2. &#945;
      Guess there is something messed up with the AIX character-table?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1179897]
Approved by Marshall
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2024-04-20 01:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found