Hi all,
I have the following little script demonstrating a case (case 1 in the output below), which I can't explain to myself. So I hoped, someone can explain it to me or give the right hints.
#!/usr/bin/perl -CO
use strict;
use warnings;
use Encode;
use utf8;
my $a = 'ä';
print "UTF8-Flag: ", utf8::is_utf8($a) ? "Yes" : "No";
print " matches word: ", $a =~ /\w/ ? "Yes\n" : "No\n";
my $b = encode("ISO-8859-1", $a);
print "UTF8-Flag: ", utf8::is_utf8($b) ? "Yes" : "No";
print " matches word: ", $b =~ /\w/ ? "Yes\n" : "No\n";
use locale;
$a = 'ä';
print "UTF8-Flag: ", utf8::is_utf8($a) ? "Yes" : "No";
print " matches word: ", $a =~ /\w/ ? "Yes\n" : "No\n";
$b = encode("ISO-8859-1", $a);
print "UTF8-Flag: ", utf8::is_utf8($b) ? "Yes" : "No";
print " matches word: ", $b =~ /\w/ ? "Yes" : "No";
print "\n";
The output on a linux box with locale de_DE.UTF-8 and perl source code encoded in UTF-8 is:
UTF8-Flag: Yes matches word: Yes
UTF8-Flag: No matches word: No
UTF8-Flag: Yes matches word: No
UTF8-Flag: No matches word: No
It's the very first case I can't explain to me.
Why is an unicode-flagged 'ä' matched against words when locale is not set explicitly?
Thanks in advance
Andreas
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.