Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^2: CGI::Application - Which is the proper way of handling and outputting utf8

by mrajcok (Initiate)
on Mar 10, 2010 at 17:23 UTC ( [id://827842]=note: print w/replies, xml ) Need Help??


in reply to Re: CGI::Application - Which is the proper way of handling and outputting utf8
in thread CGI::Application - Which is the proper way of handling and outputting utf8

Hi rhesa, Thank you for posting this. I'm using your code in my app with two efficiency tweaks/changes:
  1. check for the 'setting a param' case first
  2. use utf8::decode() instead of decode_utf8(), since due to a bug in Encode, decode_utf8() always sets the UTF8 flag, even for ASCII-only text. utf8::decode() doesn't set the UTF8 flag for this case, so the faster ASCII semantics can be used where possible. (Based on ikegami's comment below, maybe I should say "where safe" instead of "where possible"). See Behaviour of Encode::decode_utf8 on ASCII
package CGI::as_utf8; # add UTF-8 decode capability to CGI.pm BEGIN { use strict; use warnings; use CGI 3.47; # earlier versions have a UTF-8 double-decoding bug { no warnings 'redefine'; my $param_org = \&CGI::param; my $might_decode = sub { my $p = shift; # make sure upload() filehandles are not modified return $p if !$p || ( ref $p && fileno($p) ); utf8::decode($p); # may fail, but only logs an error $p }; *CGI::param = sub { # setting a param goes through the original interface goto &$param_org if scalar @_ != 2; my $q = $_[0]; # assume object calls always my $p = $_[1]; return wantarray ? map { $might_decode->($_) } $q->$param_org($p) : $might_decode->( $q->$param_org($p) ); } } }
  • Comment on Re^2: CGI::Application - Which is the proper way of handling and outputting utf8
  • Download Code

Replies are listed 'Best First'.
Re^3: CGI::Application - Which is the proper way of handling and outputting utf8
by ikegami (Patriarch) on Mar 11, 2010 at 02:32 UTC

    so the faster ASCII semantics can be used where possible.

    Almost. The UTF8=1 format is still unnecessarily used if the string is "É", for example. You'd have to include the following after decoding if you wanted to always use the UTF8=0 format when possible.

    utf8::downgrade($p, 1);

    It's safer not to do that, though, as it affects \w*, uc()*, buggy XS, etc.

    * — \w and uc() are unaffected when using use 5.012; or use feature qw( unicode_strings );.

      Is there a way to have imports automatically forwarded to CGI?

      For example with:

      use CGI::as_utf8 qw/-no_xhtml/;
      -no_xhtml is not forwarded to CGI!

        I think, with the current version of CGI.pm, you could get the -no_xhtml pragma into CGI.pm by putting this into your application module. (It overrides the cgiapp_get_query method in the CGI::Application parent.)

        sub cgiapp_get_query { my $self = shift; use CGI ('-no_xhmtml'); my $q = CGI->new; return $q; }

        The current version of CGI.pm doesn't use that particular pragma, but the code would work if some other pragma were desired, such as -utf8.

        I think the following will do the trick.
        sub import { shift; unshift(@_, 'CGI'); goto &CGI::import; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://827842]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (2)
As of 2024-04-26 02:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found