http://www.perlmonks.org?node_id=263674

IlyaM has asked for the wisdom of the Perl Monks concerning the following question:

Can anybody please explain me why these two one liners omit different result?
ilya@juil:~$ perl -e 'print "b" =~ /[A-C]/i ? "true\n" : "false\n"' true ilya@juil:~$ perl -Mutf8 -e 'print "b" =~ /[A-C]/i ? "true\n" : "false +\n"' false
I checked docs (i.e. perlunicode, perlre and utf8) but I didn't notice anything which would explain such behavior.

Knowing that unicode support in Perl is very new and changes with each new release I guess it is worth to mention that I still use 5.6.1.

--
Ilya Martynov, ilya@iponweb.net
CTO IPonWEB (UK) Ltd
Quality Perl Programming and Unix Support UK managed @ offshore prices - http://www.iponweb.net
Personal website - http://martynov.org

Replies are listed 'Best First'.
Re: Ranges in case insensitive regexps in unicode mode
by broquaint (Abbot) on Jun 06, 2003 at 14:08 UTC
    'tis a bug in 5.6.1 and its less than sturdy unicode support which has been corrected in 5.8
    shell> perl5.8.0 -Mutf8 -le 'print "b" =~ /[A-C]/i ? "true" : "false"' true shell> perl5.6.1 -Mutf8 -le 'print "b" =~ /[A-C]/i ? "true" : "false"' false

    HTH

    _________
    broquaint

Re: Ranges in case insensitive regexps in unicode mode
by jmcnamara (Monsignor) on Jun 06, 2003 at 14:10 UTC

    It looks like the behaviour changed (i.e. was fixed) between 5.6 and 5.8:
    $ perl5.6.0 -Mutf8 -e 'print "b" =~ /[A-C]/i ? "true\n" : "false\n +"' false $ perl5.8.0 -Mutf8 -e 'print "b" =~ /[A-C]/i ? "true\n" : "false\n +"' true

    --
    John.

Re: Ranges in case insensitive regexps in unicode mode
by december (Pilgrim) on Jun 06, 2003 at 22:55 UTC

    # perl -Mutf8 -e 'print "b" =~ /[A-C]/i ? "true\n" : "false\n"' false # perl -v This is perl, v5.6.1 built for i386-linux -- # perl -Mutf8 -e 'print "b" =~ /[A-C]/i ? "true\n" : "false\n"' true # perl -v This is perl, v5.8.0 built for i386-openbsd

    I advise you to update to perl 5.8.0, because I have noticed similar errors between utf and other charset conversions/comparisions functions in perl 5.6.1. Some things just don't seem to work as expected. I'm not an utf or perl expert, but 5.8.0 seems to be more consistent, so if you have to do a lot of utf/charset relevant things...


       december