Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: CGI::Application - Which is the proper way of handling and outputting utf8

by rhesa (Vicar)
on Nov 19, 2007 at 03:35 UTC ( #651574=note: print w/ replies, xml ) Need Help??


in reply to CGI::Application - Which is the proper way of handling and outputting utf8

See Re^4: CGI.pm: automatically decode param() for another workaround of CGI.pm input.

Here's my complete patch for CGI.pm. The module assumes your pages and forms are always in utf8, and that you always use the OO interface of CGI.pm (which should be the case in CGI::Application).
package CGI::as_utf8; BEGIN { use strict; use warnings; use CGI; use Encode; { no warnings 'redefine'; my $param_org = \&CGI::param; my $might_decode = sub { my $p = shift; # make sure upload() filehandles are not modified return ( !$p || ( ref $p && fileno($p) ) ) ? $p : eval { decode_utf8($p) } || $p; }; *CGI::param = sub { my $q = $_[0]; # assume object calls always my $p = $_[1]; # setting a param goes through the original interface goto &$param_org if scalar @_ != 2; return wantarray ? map { $might_decode->($_) } $q->$param_org($p) : $might_decode->( $q->$param_org($p) ); } } } 1;
Usage is simple. Just add a use CGI::as_utf8; in your CGI::Application module(s). It's been battle-tested on a site that does about 7 million cgi hits per day, so it works in practice. Suggestions for improvements are welcome though!


Comment on Re: CGI::Application - Which is the proper way of handling and outputting utf8
Select or Download Code
Re^2: CGI::Application - Which is the proper way of handling and outputting utf8
by mrajcok (Initiate) on Mar 10, 2010 at 17:23 UTC
    Hi rhesa, Thank you for posting this. I'm using your code in my app with two efficiency tweaks/changes:
    1. check for the 'setting a param' case first
    2. use utf8::decode() instead of decode_utf8(), since due to a bug in Encode, decode_utf8() always sets the UTF8 flag, even for ASCII-only text. utf8::decode() doesn't set the UTF8 flag for this case, so the faster ASCII semantics can be used where possible. (Based on ikegami's comment below, maybe I should say "where safe" instead of "where possible"). See Behaviour of Encode::decode_utf8 on ASCII
    package CGI::as_utf8; # add UTF-8 decode capability to CGI.pm BEGIN { use strict; use warnings; use CGI 3.47; # earlier versions have a UTF-8 double-decoding bug { no warnings 'redefine'; my $param_org = \&CGI::param; my $might_decode = sub { my $p = shift; # make sure upload() filehandles are not modified return $p if !$p || ( ref $p && fileno($p) ); utf8::decode($p); # may fail, but only logs an error $p }; *CGI::param = sub { # setting a param goes through the original interface goto &$param_org if scalar @_ != 2; my $q = $_[0]; # assume object calls always my $p = $_[1]; return wantarray ? map { $might_decode->($_) } $q->$param_org($p) : $might_decode->( $q->$param_org($p) ); } } }

      so the faster ASCII semantics can be used where possible.

      Almost. The UTF8=1 format is still unnecessarily used if the string is "É", for example. You'd have to include the following after decoding if you wanted to always use the UTF8=0 format when possible.

      utf8::downgrade($p, 1);

      It's safer not to do that, though, as it affects \w*, uc()*, buggy XS, etc.

      * — \w and uc() are unaffected when using use 5.012; or use feature qw( unicode_strings );.

        Is there a way to have imports automatically forwarded to CGI?

        For example with:

        use CGI::as_utf8 qw/-no_xhtml/;
        -no_xhtml is not forwarded to CGI!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://651574]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (14)
As of 2014-09-22 13:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (192 votes), past polls