<?xml version="1.0" encoding="windows-1252"?>
<node id="1003447" title="Re: match utf8" created="2012-11-12 08:54:28" updated="2012-11-12 08:54:28">
<type id="11">
note</type>
<author id="757127">
tobyink</author>
<data>
<field name="doctext">
&lt;p&gt;Unless you're using an ancient version of Perl, &lt;c&gt;\w&lt;/c&gt; should match any Unicode word character. According to [mod://perlre] there are over 100,000 characters it matches.&lt;/p&gt;

&lt;code&gt;
use 5.010;
use strict;
use warnings;
use utf8::all;

my $string = "the café";
say "GOT: $1" if $string =~ /(\w{4})/;
&lt;/code&gt;

&lt;p&gt;Make sure your strings are being interpreted as character strings rather than byte strings though. (See [mod://perlunicode] and [mod://utf8].)&lt;/p&gt;

&lt;!-- Node text goes above. Div tags should contain sig only --&gt;
&lt;div class="pmsig"&gt;&lt;div class="pmsig-757127"&gt;
&lt;small&gt;&lt;small&gt;
&lt;tt&gt;perl -E'sub Monkey::do{say$_,for@_,do{($monkey=&amp;#x5B;caller(0)]-&gt;&amp;#x5B;3])=~s{::}{ }and$monkey}}"Monkey say"-&gt;Monkey::do'
&lt;/tt&gt;&lt;/small&gt;&lt;/small&gt;
&lt;/div&gt;&lt;/div&gt;</field>
<field name="root_node">
1003444</field>
<field name="parent_node">
1003444</field>
</data>
</node>
