Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

UTF-8 characters in variable names: some characters are not allowed

by kikuchiyo (Hermit)
on Sep 06, 2009 at 17:51 UTC ( [id://793800]=perlquestion: print w/replies, xml ) Need Help??

kikuchiyo has asked for the wisdom of the Perl Monks concerning the following question:

According to the perlunicode manpage, use utf8 allows the use of unicode (UTF-8 encoded) characters in not only string literals, but identifier names.

So I had the urge to try the following program:
use utf8; my $€ = 1; print $€;
And it failed with this error message:
Malformed UTF-8 character (unexpected end of string) at utf8test.pl li +ne 3. Unrecognized character \x82 in column 5 at utf8test.pl line 3.
This version, however, ran correctly:
use utf8; my $á = 1; print $á;
So it seems that while certain unicode characters can be in variable names, others cause an error.

I wrote a little script to test which characters are supported.
#!/usr/bin/perl use strict; use warnings; use utf8; use charnames ':full'; my ($fh, $rfh); my ($vname, $errorcode, $message, $col, $byte); open($rfh, '>>:encoding(UTF-8)', 'utf8report.txt') || die 'Error openi +ng file'; for (0x100..0x9fff) { # You may want to change these numbers if the sc +ript runs for too long open($fh, '>:encoding(UTF-8)', 'utf8test.pl') || die 'Error openin +g file'; print $fh "use utf8;\n"; $vname = pack "U", $_; print $fh "my \$$vname = $_;\nprint \$$vname;"; close $fh; #system "perl", "-c", "utf8test.pl"; $message = `perl -c utf8test.pl 2>&1`; ($byte, $col) = $message =~ /character \\x(..).*column (\d)/; $errorcode = ($? >> 8) ? "FAIL at byte ".($col-4)."($byte)" : "PAS +S"; print $rfh "$_\t$errorcode, character ".(charnames::viacode($_))." +\n"; print $_-$_%100,"\r"; } print "\n"; close $rfh;
The results (omitted here) show that indeed, certain characters don't seem to be eligible as variable names.

The question, then, is why?

Of course, this is not much point in asking this, as using unicode in variable names is still a bad idea, according to many. Yet, a perl hacker should be able to use the euro sign (for example) as a variable name if he so chooses.

(Test was run on a Windows XP with Camelbox Perl 5.10.0)

Replies are listed 'Best First'.
Re: UTF-8 characters in variable names: some characters are not allowed
by ikegami (Patriarch) on Sep 06, 2009 at 18:05 UTC
    Perl identifiers must consist of word characters (alphanumerics and underscore). "a" and "a acute" are word characters. "dollar sign" and "euro sign" are not.
      That's correct - I just wish the error message was clearer. It says "Malformed UTF-8 character" even though the file is valid UTF-8. I'll open search the perl RT, and open a ticket if none is open already.

      Update: I didn't find an open ticket for it, so I opened a new one.

      Perl 6 - links to (nearly) everything that is Perl 6.
        I didn't clue into that. Yeah, that's not an appropriate error message.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://793800]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-04-18 18:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found