Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Unicode and regexes

by hotshot (Prior)
on Oct 30, 2002 at 09:32 UTC ( #209014=perlquestion: print w/replies, xml ) Need Help??
hotshot has asked for the wisdom of the Perl Monks concerning the following question:

ho guys!

I'm checking the overhead of supporting unicode in my Perl project, as I managed to see till now, without using any unicode module (utf8), Perl just "gives what she gets", for example when I used opendir to get dirs list under a given directory and I have there dirs opened in korean or german language (in utf8), perl receives it and displays it properly.

The problem starts when I try to manipulate the directory with a regular expression. does it mean I'll have to change all my regexps (endless regexps) to support unicode (using IsAlnum and '-' for \w for example), the regexps will be much complicated (long), and won't have all the power of old ones?

Hotshot

Edited: ~Wed Oct 30 16:38:08 2002 (GMT) by footpad: Retitled (was Unicode), added <P> tags, and fixed minor spelling errors - per Consideration

Replies are listed 'Best First'.
Re: Unicode and regexes
by dakkar (Hermit) on Oct 30, 2002 at 17:51 UTC

    The regexps, per se, don't need any change (I'm assuming Perl 5.8.0, since 5.6.x had some problems). You need to assure two things:

    1. that your strings are correctly encoded
    2. that Perl knows it

    The first is a problem in itself, but a bit off-topic.

    The second can be done in two ways:

    1. if the strings come from a filehandle, you can use something like open(FH, "<:utf8", "file") to tell Perl to treat data as utf-8 (or use the :encoding layer, see perldoc -f open
    2. otherwise (such as your example, from a dirhandle), use Encode; and $string=Encode::decode("utf-8",$string);
      and if I still use Perl 5.6.1?

      Hotshot

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://209014]
Approved by Tanalis
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2018-07-22 07:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?















    Results (452 votes). Check out past polls.

    Notices?