Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Can't decode ill-formed UTF-8 octet sequence <FF> at /usr/share/perl5/CGI.pm line 1116.

by nikolay (Beadle)
on Dec 30, 2016 at 10:39 UTC ( [id://1178644]=perlquestion: print w/replies, xml ) Need Help??

nikolay has asked for the wisdom of the Perl Monks concerning the following question:

I fail to receive HTML form data (text fields and uploaded files) w/ my PERL script, w/ the following error messages:

"Can't decode ill-formed UTF-8 octet sequence <FF> at /usr/share/perl5/CGI.pm line 1116."

The PERL code is:

#!/usr/bin/perl -T use warnings; use strict; use CGI; use CGI::Carp qw ( fatalsToBrowser ); use utf8::all; use Encode qw/ decode /; my $a=undef; my $b=undef; $a=CGI->new; $b=$a->param( 'tekst' ); $b=decode( 'UTF-8', $b ); exit 0;

How do i fix this?

  • Comment on Can't decode ill-formed UTF-8 octet sequence <FF> at /usr/share/perl5/CGI.pm line 1116.
  • Download Code

Replies are listed 'Best First'.
Re: Can't decode ill-formed UTF-8 octet sequence <FF> at /usr/share/perl5/CGI.pm line 1116.
by hippo (Bishop) on Dec 30, 2016 at 14:00 UTC

    I am unable to reproduce your findings. After removing the apparently unnecessary use utf8::all; the script runs fine to completion with no warnings or errors:

    $ ./1178644.pl "$FOO" $ perl -v This is perl 5, version 20, subversion 3 (v5.20.3) built for x86_64-li +nux-thread-multi (with 16 registered patches, see perl -V for more detail) Copyright 1987-2015, Larry Wall Perl may be copied only under the terms of either the Artistic License + or the GNU General Public License, which may be found in the Perl 5 source ki +t. Complete documentation for Perl, including FAQ lists, should be found +on this system using "man perl" or "perldoc perl". If you have access to + the Internet, point your browser at http://www.perl.org/, the Perl Home Pa +ge. $

    with $FOO set to be "tekst=Οὐχὶ ταὐτὰ παρίσταταί". This is using version 4.21 of CGI. Which version are you using and what input are you supplying to produce the stated error message?

      Thank you for your research and idea of removing the use utf8; ! -- I would never guess that!

      By the way, why do you suppose that it is useless here? -- As i understand, that directive tells PERL about script encoding, does it?

        Yes, it does that and some more (see the documentation). Since there are no utf8 characters in your script and since you are handling the CGI parameter decoding yourself explicity then there is no need to use this module. Without it everything seems to run smoothly for me and, according to your follow-up post above, also for you.

      When i remove utf8::all; , i get 500 - Internal Server Error message. And nothing in the server (lighttpd) log.

      When i tried to run the script from my command line, like

      1.pl tekst="&#1092;&#1099;&#1074;&#1072;&#1092;&#1099;&#1072;" i got no error message w/ turned off utf8::all; and
      Status: 500 Content-type: text/html <h1>Software error:</h1> <pre>Cannot decode string with wide characters at /usr/lib/i386-linux- +gnu/perl/5.24/Encode.pm line 202.</pre> <p>For help, please send mail to this site's webmaster, giving this er +ror message and the time and date of the error.</p> [Sat Dec 31 16:27:02 2016] 1.pl: Cannot decode string with wide charac +ters at /usr/lib/i386-linux-gnu/perl/5.24/Encode.pm line 202.

      w/ turned on -- that simply shows that the string is already in UTF-8.

      Mine versions of software are: perl 5.24.1, and CGI -- 4.35.

Re: Can't decode ill-formed UTF-8 octet sequence <FF> at /usr/share/perl5/CGI.pm line 1116.
by Corion (Patriarch) on Dec 30, 2016 at 10:43 UTC

    Maybe your browser is not uploading the data as UTF-8 at all?

    What data is your browser sending?

      How will find out that?

        By looking at what your browser sends, potentially via Wireshark or maybe the Mozilla HTTP Live Headers extension. You can also dump what the script receives directly by not using CGI and printing everything that the script reads from STDIN to a file.

        Most likely, your browser tells CGI that it is sending UTF-8 but isn't sending UTF-8. But without seeing the headers and data, that's always hard to tell.

Re: Can't decode ill-formed UTF-8 octet sequence <FF> at /usr/share/perl5/CGI.pm line 1116.
by Anonymous Monk on Dec 31, 2016 at 10:11 UTC

    Are you running under mod_perl?

    I cannot replicate under cli :)

    #!/usr/bin/perl -T use strict; use warnings; use CGI; use CGI::Carp qw ( fatalsToBrowser ); use Data::Dump qw/ pp /; my $bom = "bom=\xEF\xBB\xBF"; my $junk = "junk=\xC3\x2E"; my $fun = sub { my $q = CGI->new( "$bom;$junk"); my $qbom = $q->param('bom'); my $qjunk = $q->param('junk'); $q->charset('UTF-8'); print $q->header('text/plain'), join "\n", pp($bom,$qbom),$qbom, p +p($junk,$qjunk),$qjunk, "\n"; }; $fun->(); local $CGI::PARAM_UTF8 = $CGI::PARAM_UTF8 = 1; binmode STDOUT, ':encoding(UTF-8)'; $fun->(); __END__

    Also that  [CLOSED] stuff shouldn't have [] you should <p><b>update:</b> solved it , it was ...

      I do know. I use lighttpd and not apache. It has only two modules to run CGI -- cgi and fastcgi. And i do not know what stands behind both.

      Thank you for the great code!

      On CLOSED, one here says one, another -- another. I believe since i will listen to you, the first will explain that his way. So, i quit w/ status keeping on my topics.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1178644]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2024-04-26 03:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found