Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Getting mad with CGI::Application and utf8

by isync (Hermit)
on Feb 26, 2008 at 10:31 UTC ( [id://670272]=perlquestion: print w/replies, xml ) Need Help??

isync has asked for the wisdom of the Perl Monks concerning the following question:

Monks,
I am in deep trouble here and are pulling my hair out.

I am trying to implement a web application that cleanly operates in utf8 with CGI::Application. Is it possible?

That's where I am:
  • my code is in utf8: use utf8; at the beginning of application.cgi and all modules.

  • * a sidenote- I use
    use CGI::Fast();
    to enable fastcgi in my application.cgi

  • *in an act of helplessness is i added
    use open ':utf8'; use open ':std';
    to application.cgi, without really knowing what it does (and actually it seems to do nothing.

  • *in Application.pm I add     $self->header_add(-charset => 'utf-8'); in setup { }

  • *after serious trouble with my MySQL data (where I set everything to utf8: db, connnection etc. see this thread) I found out that my perl script only outputs this data to display correctly in the browser with
    my $flagOn = $html->output; Encode::_utf8_on($flagOn); return $flagOn;
    , actually I think that this only works on my linux browser environment, I saw a hint of malfunction on Windows... esoteric.

  • *finally file uploads came into play. That's when I dropped my conversion layer module for all param-stuff and began to manually add     my $param_f    = decode("utf8", $q->param("f") );to all my subroutines. Now - occasionally! (which it also does under simple cgi not fast cgi operation and seems to be connected to overall load) - my upload crashes with CGI.pm (version 3.33) throwing things like "Malformed UTF-8 character (unexpected non-continuation byte 0xd9, 1 byte after start byte 0xee, expected 3 bytes) in index at (eval 139) line 15." in apache's error.log and to the browser via Carp "CGI.pm: Server closed socket during multipart read (client aborted?)."... last idea: apache or my script?


update: I spawned another thread, as the fileupload problem seems to be losely connected to this problem.

Replies are listed 'Best First'.
Re: Getting mad with CGI::Application and utf8
by moritz (Cardinal) on Feb 26, 2008 at 10:49 UTC
    • Did you check if CGI::Fast returns text strings (i.e. with UTF-8 flag set)?
    • use open ':utf8' only affects open (iirc), so it's useless you unless open files.
    • Do you use binmode STDOUT, ':utf8';?
    • Don't use Encode::_utf8_on($flagOn);</li> - it's an internal method of the [mod://Encode] module, and shouldn't be called from the outside. Use <c>$string = decode_utf8 $string; instead.

    Now - occasionally! (which it also does under simple cgi not fast cgi operation and seems to be connected to overall load) - my upload crashes with CGI.pm (version 3.33) throwing "Malformed utf8" in apache's error.log

    What does "occasionally" mean? Does it alwyays die for the same set of data?

    And finally: does the data contain malfromed UTF8?

      At first, thanks for your reply!

      Did you check if CGI::Fast returns text strings (i.e. with UTF-8 flag set)?
      No, as I don't know spontaneously how.

      use open ':utf8' only affects open (iirc), so it's useless you unless open files.
      Ok. But, mh, what about caches, like File::Cache? Should I turn :utf8 for open on when I use these or will the module handle it internally?

      Do you use binmode STDOUT, ':utf8';?
      I did, actually I used binmode STDOUT, ":encoding(utf8)";. But then it seems to break CGI-Application-Plugin-CompressGzip! (more hair pulling..) Should I set it again? And, should I also set again binmode STDIN, ":encoding(utf8)" (or is this redundant with my my $param_f    = decode("utf8", $q->param("f") ) procedure??

      Don't use Encode::_utf8_on($flagOn); - it's an internal method of the Encode module, and shouldn't be called from the outside. Use <c>$string = decode_utf8 $string; instead.
      I know. But I inspected the returned strings (local $Data::Dumper::Useqq = 1;) and found out they were properly formed utf8. Until they pass trought the final stages of CGI::Application, which broke them again. So I tried various solutions and found out that yes, the string was proper utf8 but without the flag on. When I switched it on manually, C::A left them alone and seem to pass it till the browser stage. (you see, I am deeply woven in trouble...)

      update:I now use decode_utf8 and it works just as good.

      Regarding testdata
      I tested with a 30K tar.gz, a 1K perl script and a 500K mp3 file - each the same problem, they sometimes come through..
        You can check if the UTF-8 flag is set with Devel::Peek, which can dump all the internal flags of a scalar varaible.

        There is also utf8::is_utf8, but I somehow suspect that the results might be subtly different (not sure yet, haven't really tried. Perhaps the difference that I thought I had noticed came from somewhere else.)

      Did you check if CGI::Fast returns text strings (i.e. with UTF-8 flag set)?

      Please note that the absense of the UTF8 flag does not mean that the string is not a text string.

      Checking if the UTF8 flag is set, should be done only by people who know about Perl's Unicode internals. To all other people, it will only add to the confusion.

      Properly decode and encode, and all will be fine. (Though you may need an occasional utf8::upgrade, see also Unicode::Semantics.)

Re: Getting mad with CGI::Application and utf8
by isync (Hermit) on Feb 26, 2008 at 11:58 UTC
    Is there someone out there who has a complete script/receipt which gets CGI::Application working with utf8?

    I found this excellent post by Andrew which adresses the problem but I have not the module writing/namspace mapping skills to get his modules to work with CGI::Application.
    I tried... But it breaks too many things. C::A::Plugin::DBH uses DBI in an abstraction layer, so I would have to edit/write my own C::A::Plugin::DBH. Then, I use CGI::Fast which is an abstraction on top of CGI.pm, so I also need a new CGI::Fast, which I tried but had no luck... So many things...
Re: Getting mad with CGI::Application and utf8
by Juerd (Abbot) on Feb 26, 2008 at 20:43 UTC

    [moritz] already pointed out that you shouldn't do _utf8_on. The :utf8 IO layer does the same thing and should also not be used. Using _utf8_on or :utf8 may result in serious malfunction and security holes.

      There is another aproach: one i use from time to time. Dont use perls borken utf-8 support. Pass the Argument -C0 to perl at the head of your script , and perl will treat all strings as raw binary. As long as you only break your strings on well defined boundaries, and dont attempt to naively count characters, it will work fine. If you need to truncate a string at a certain arbitrary point, such as "100 bytes" or so, you will have to make sure it stops at a well defined UTF-8 endpoint.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://670272]
Approved by svenXY
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2024-03-19 05:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found