Ranges in case insensitive regexps in unicode mode

IlyaM has asked for the wisdom of the Perl Monks concerning the following question:

Can anybody please explain me why these two one liners omit different result?

ilya@juil:~$ perl -e 'print "b" =~ /[A-C]/i ? "true\n" : "false\n"'
true
ilya@juil:~$ perl -Mutf8 -e 'print "b" =~ /[A-C]/i ? "true\n" : "false
+\n"'
false
[download]

I checked docs (i.e. perlunicode, perlre and utf8) but I didn't notice anything which would explain such behavior.

Knowing that unicode support in Perl is very new and changes with each new release I guess it is worth to mention that I still use 5.6.1.

--
Ilya Martynov, ilya@iponweb.net
CTO IPonWEB (UK) Ltd
Quality Perl Programming and Unix Support UK managed @ offshore prices - http://www.iponweb.net
Personal website - http://martynov.org

Comment on Ranges in case insensitive regexps in unicode mode Download Code

Replies are listed 'Best First'.
Re: Ranges in case insensitive regexps in unicode mode by broquaint (Abbot) on Jun 06, 2003 at 14:08 UTC
'tis a bug in `5.6.1` and its less than sturdy unicode support which has been corrected in `5.8` `shell> perl5.8.0 -Mutf8 -le 'print "b" =~ /[A-C]/i ? "true" : "false"' true shell> perl5.6.1 -Mutf8 -le 'print "b" =~ /[A-C]/i ? "true" : "false"' false` [download] HTH `_________ broquaint`	[reply] [d/l]
Re: Ranges in case insensitive regexps in unicode mode by jmcnamara (Monsignor) on Jun 06, 2003 at 14:10 UTC
It looks like the behaviour changed (i.e. was fixed) between 5.6 and 5.8: `$ perl5.6.0 -Mutf8 -e 'print "b" =~ /[A-C]/i ? "true\n" : "false\n +"' false $ perl5.8.0 -Mutf8 -e 'print "b" =~ /[A-C]/i ? "true\n" : "false\n +"' true` [download] -- John.	[reply] [d/l]
Re: Ranges in case insensitive regexps in unicode mode by december (Pilgrim) on Jun 06, 2003 at 22:55 UTC
`# perl -Mutf8 -e 'print "b" =~ /[A-C]/i ? "true\n" : "false\n"' false # perl -v This is perl, v5.6.1 built for i386-linux -- # perl -Mutf8 -e 'print "b" =~ /[A-C]/i ? "true\n" : "false\n"' true # perl -v This is perl, v5.8.0 built for i386-openbsd` [download] I advise you to update to perl 5.8.0, because I have noticed similar errors between utf and other charset conversions/comparisions functions in perl 5.6.1. Some things just don't seem to work as expected. I'm not an utf or perl expert, but 5.8.0 seems to be more consistent, so if you have to do a lot of utf/charset relevant things... december	[reply] [d/l]

Back to Seekers of Perl Wisdom