Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Ranges in case insensitive regexps in unicode mode

by IlyaM (Parson)
on Jun 06, 2003 at 14:04 UTC ( #263674=perlquestion: print w/ replies, xml ) Need Help??
IlyaM has asked for the wisdom of the Perl Monks concerning the following question:

Can anybody please explain me why these two one liners omit different result?
ilya@juil:~$ perl -e 'print "b" =~ /[A-C]/i ? "true\n" : "false\n"' true ilya@juil:~$ perl -Mutf8 -e 'print "b" =~ /[A-C]/i ? "true\n" : "false +\n"' false
I checked docs (i.e. perlunicode, perlre and utf8) but I didn't notice anything which would explain such behavior.

Knowing that unicode support in Perl is very new and changes with each new release I guess it is worth to mention that I still use 5.6.1.

--
Ilya Martynov, ilya@iponweb.net
CTO IPonWEB (UK) Ltd
Quality Perl Programming and Unix Support UK managed @ offshore prices - http://www.iponweb.net
Personal website - http://martynov.org

Comment on Ranges in case insensitive regexps in unicode mode
Download Code
Re: Ranges in case insensitive regexps in unicode mode
by broquaint (Abbot) on Jun 06, 2003 at 14:08 UTC
    'tis a bug in 5.6.1 and its less than sturdy unicode support which has been corrected in 5.8
    shell> perl5.8.0 -Mutf8 -le 'print "b" =~ /[A-C]/i ? "true" : "false"' true shell> perl5.6.1 -Mutf8 -le 'print "b" =~ /[A-C]/i ? "true" : "false"' false

    HTH

    _________
    broquaint

Re: Ranges in case insensitive regexps in unicode mode
by jmcnamara (Monsignor) on Jun 06, 2003 at 14:10 UTC

    It looks like the behaviour changed (i.e. was fixed) between 5.6 and 5.8:
    $ perl5.6.0 -Mutf8 -e 'print "b" =~ /[A-C]/i ? "true\n" : "false\n +"' false $ perl5.8.0 -Mutf8 -e 'print "b" =~ /[A-C]/i ? "true\n" : "false\n +"' true

    --
    John.

Re: Ranges in case insensitive regexps in unicode mode
by december (Pilgrim) on Jun 06, 2003 at 22:55 UTC

    # perl -Mutf8 -e 'print "b" =~ /[A-C]/i ? "true\n" : "false\n"' false # perl -v This is perl, v5.6.1 built for i386-linux -- # perl -Mutf8 -e 'print "b" =~ /[A-C]/i ? "true\n" : "false\n"' true # perl -v This is perl, v5.8.0 built for i386-openbsd

    I advise you to update to perl 5.8.0, because I have noticed similar errors between utf and other charset conversions/comparisions functions in perl 5.6.1. Some things just don't seem to work as expected. I'm not an utf or perl expert, but 5.8.0 seems to be more consistent, so if you have to do a lot of utf/charset relevant things...


       december

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://263674]
Approved by broquaint
Front-paged by VSarkiss
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2014-07-25 11:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (170 votes), past polls