Unless you're using an ancient version of Perl, \w should match any Unicode word character. According to perlre there are over 100,000 characters it matches.
use 5.010;
use strict;
use warnings;
use utf8::all;
my $string = "the café";
say "GOT: $1" if $string =~ /(\w{4})/;
Make sure your strings are being interpreted as character strings rather than byte strings though. (See perlunicode and utf8.)
perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
|