Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Unicode and regexes

by hotshot (Prior)
on Oct 30, 2002 at 09:32 UTC ( #209014=perlquestion: print w/replies, xml ) Need Help??
hotshot has asked for the wisdom of the Perl Monks concerning the following question:

ho guys!

I'm checking the overhead of supporting unicode in my Perl project, as I managed to see till now, without using any unicode module (utf8), Perl just "gives what she gets", for example when I used opendir to get dirs list under a given directory and I have there dirs opened in korean or german language (in utf8), perl receives it and displays it properly.

The problem starts when I try to manipulate the directory with a regular expression. does it mean I'll have to change all my regexps (endless regexps) to support unicode (using IsAlnum and '-' for \w for example), the regexps will be much complicated (long), and won't have all the power of old ones?


Edited: ~Wed Oct 30 16:38:08 2002 (GMT) by footpad: Retitled (was Unicode), added <P> tags, and fixed minor spelling errors - per Consideration

Replies are listed 'Best First'.
Re: Unicode and regexes
by dakkar (Hermit) on Oct 30, 2002 at 17:51 UTC

    The regexps, per se, don't need any change (I'm assuming Perl 5.8.0, since 5.6.x had some problems). You need to assure two things:

    1. that your strings are correctly encoded
    2. that Perl knows it

    The first is a problem in itself, but a bit off-topic.

    The second can be done in two ways:

    1. if the strings come from a filehandle, you can use something like open(FH, "<:utf8", "file") to tell Perl to treat data as utf-8 (or use the :encoding layer, see perldoc -f open
    2. otherwise (such as your example, from a dirhandle), use Encode; and $string=Encode::decode("utf-8",$string);
      and if I still use Perl 5.6.1?


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://209014]
Approved by Tanalis
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2018-11-19 00:38 GMT
Find Nodes?
    Voting Booth?
    My code is most likely broken because:

    Results (206 votes). Check out past polls.