http://www.perlmonks.org?node_id=312540

ph0enix has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,
working on one web project I can see strange behaviour when I use $data = Encode::decode_utf8($data). This code does not set utf8 flag on for $data. Script is running under Apache2 with mod_perl and use CGI for params haldling.

I need to compare posted value with old one stored in database and update database only if posted different value. Looking to log i saw that fields with some characters are updated every time. After little debugging I found the reason - newly posted values does not have marked as utf8...

At the bedining of my script I call init function like this one

sub init { my $q = CGI->new; for my $param ($q->param) { if ($param =~ /^(.+)$/) { $Params::vars{$1} .= fix_utf($q->param($param)); } } $q->delete_all(); } sub fix_utf { my $par = shift; my $res = ''; # decode obtained value to utf8 string if needed $res = Encode::decode_utf8($par) if !Encode::is_utf8($par); print STDERR "value: '$par'\n" "\tflg1: \t", Encode::is_utf8($par) ? 1 : 0, "\n", "\tflg2: \t", Encode::is_utf8($res) ? 1 : 0, "\n"; # # set utf8 flag if previous operation failed # Encode::_utf8_on($res) if !Encode::is_utf8($res); # print STDERR "\tflg3: \t", Encode::is_utf8($res) ? 1 : 0, "\n"; return $res; }

As you can see I also try to set utf8 flag on by using Encode::_utf8_on call and even if this is uncommented I get sometimes following output

value: 'some note' flg1: 0 - original flg2: 0 - after decode_utf8() flg3: 0 - after _utf8_on()

There are also some warnings in apache log I don't underestand.

Use of uninitialized value in require at (eval 35) line 6, <FH> line 9 + (#2) [Fri Dec 5 16:23:17 2003] -e: Use of uninitialized value in require a +t (eval 35) line 6, <FH> line 9. [Fri Dec 5 16:23:17 2003] -e: Use of uninitialized value in require a +t (eval 35) line 6, <FH> line 9. Use of uninitialized value in require at /usr/lib/perl5/5.8.0/utf8_hea +vy.pl line 64, <FH> line 9 (#2) [Fri Dec 5 16:23:17 2003] -e: Use of uninitialized value in require a +t /usr/lib/perl5/5.8.0/utf8_heavy.pl line 64, <FH> line 9. [Fri Dec 5 16:23:17 2003] -e: Use of uninitialized value in require a +t /usr/lib/perl5/5.8.0/utf8_heavy.pl line 64, <FH> line 9. Use of uninitialized value in require at /usr/lib/perl5/5.8.0/utf8_hea +vy.pl line 78, <FH> line 9 (#2) [Fri Dec 5 16:23:17 2003] Exact.pl: Use of uninitialized value in req +uire at /usr/lib/perl5/5.8.0/utf8_heavy.pl line 78, <FH> line 9. [Fri Dec 5 16:23:17 2003] Exact.pl: Use of uninitialized value in req +uire at /usr/lib/perl5/5.8.0/utf8_heavy.pl line 78, <FH> line 9. Use of uninitialized value in do "file" at /usr/lib/perl5/5.8.0/utf8_h +eavy.pl line 137, <FH> line 9 (#2) [Fri Dec 5 16:23:17 2003] -e: Use of uninitialized value in do "file" + at /usr/lib/perl5/5.8.0/utf8_heavy.pl line 137, <FH> line 9. [Fri Dec 5 16:23:17 2003] -e: Use of uninitialized value in do "file" + at /usr/lib/perl5/5.8.0/utf8_heavy.pl line 137, <FH> line 9.

What's going wrong and how can I fix it?

Thanks ph0enix

Replies are listed 'Best First'.
Re: Encode - can't set utf8 flag on + strange warnings in Apache log
by Anonymous Monk on Dec 05, 2003 at 15:56 UTC
    I can see strange behaviour when I use $data = Encode::decode_utf8($data). This code does not set utf8 flag on for $data.
    That's not strange. Encode::decode_utf8($data) decodes data "from utf" and returns "a sequence of logical characters". It doesn't mark data as utf or convert data to utf.
      Actually, it may mark $data as utf if the operation succeeds. Check the Encode docs.

        Yes, but when i use encode_utf8() to convert already known value from string to octets before comparing with newly posted value I can't see any difference berween values.

        my $par1 = $posted_value; # not marked as utf8 yet my $par2 = $already_known_value; # marked as utf8 my $res = 0;

        When params are converted to octets and then compared result is - equal.

        # code1 - compare octets $par1 = Encode::encode_utf8($par1) if Encode::is_utf8($par1); $par2 = Encode::encode_utf8($par2) if Encode::is_utf8($par2); $res = ($par1 eq $par2) ? 1 : 0; # $res now contain 1 - $par1 and $par2 are equal

        But when try convert to string and compare after, then result is - not equal because posted value is still not mared as utf8. I allready use the same form with the same data for posting.

        # code2 - compare strings $par1 = Encode::decode_utf8($par1) if !Encode::is_utf8($par1); $par2 = Encode::decode_utf8($par2) if !Encode::is_utf8($par2); $res = ($par1 eq $par2) ? 1 : 0; # $res now contain 0 - $par1 and $par2 are not equal