Re^2: CGI::Application - Which is the proper way of handling and outputting utf8

Hi rhesa, Thank you for posting this. I'm using your code in my app with two efficiency tweaks/changes:

check for the 'setting a param' case first
use utf8::decode() instead of decode_utf8(), since due to a bug in Encode, decode_utf8() always sets the UTF8 flag, even for ASCII-only text. utf8::decode() doesn't set the UTF8 flag for this case, so the faster ASCII semantics can be used where possible. (Based on ikegami's comment below, maybe I should say "where safe" instead of "where possible"). See Behaviour of Encode::decode_utf8 on ASCII

package CGI::as_utf8;  # add UTF-8 decode capability to CGI.pm
BEGIN {
  use strict;
  use warnings;
  use CGI 3.47;  # earlier versions have a UTF-8 double-decoding bug
  {   no warnings 'redefine';
      my $param_org = \&CGI::param;
      my $might_decode = sub {
          my $p = shift;
          # make sure upload() filehandles are not modified
          return $p if !$p || ( ref $p && fileno($p) );
          utf8::decode($p);  # may fail, but only logs an error
          $p
      };
      *CGI::param = sub {
          # setting a param goes through the original interface
          goto &$param_org if scalar @_ != 2;
          my $q = $_[0];    # assume object calls always
          my $p = $_[1];
          return wantarray
              ? map { $might_decode->($_) } $q->$param_org($p)
              : $might_decode->( $q->$param_org($p) );
      }
  }
}
[download]

Comment on Re^2: CGI::Application - Which is the proper way of handling and outputting utf8 Download Code

Replies are listed 'Best First'.
Re^3: CGI::Application - Which is the proper way of handling and outputting utf8 by ikegami (Patriarch) on Mar 11, 2010 at 02:32 UTC
so the faster ASCII semantics can be used where possible. Almost. The UTF8=1 format is still unnecessarily used if the string is "É", for example. You'd have to include the following after decoding if you wanted to always use the UTF8=0 format when possible. `utf8::downgrade($p, 1);` [download] It's safer not to do that, though, as it affects `\w`, `uc()`, buggy XS, etc. * — `\w` and `uc()` are unaffected when using `use 5.012;` or `use feature qw( unicode_strings );`.	[reply] [d/l] [select]
Re^4: CGI::Application - Which is the proper way of handling and outputting utf8 by Anonymous Monk on Dec 14, 2010 at 10:01 UTC
Is there a way to have imports automatically forwarded to CGI? For example with: use CGI::as_utf8 qw/-no_xhtml/; -no_xhtml is not forwarded to CGI!	[reply]
Re^5: CGI::Application - Which is the proper way of handling and outputting utf8 by davebaker (Pilgrim) on Oct 29, 2020 at 20:52 UTC
I think, with the current version of CGI.pm, you could get the -no_xhtml pragma into CGI.pm by putting this into your application module. (It overrides the cgiapp_get_query method in the CGI::Application parent.) `sub cgiapp_get_query { my $self = shift; use CGI ('-no_xhmtml'); my $q = CGI->new; return $q; }` [download] The current version of CGI.pm doesn't use that particular pragma, but the code would work if some other pragma were desired, such as `-utf8`.	[reply] [d/l] [select]
Re^6: CGI::Application - Which is the proper way of handling and outputting utf8 by Anonymous Monk on Oct 30, 2020 at 00:12 UTC
Re^7: CGI::Application - Which is the proper way of handling and outputting utf8 by davebaker (Pilgrim) on Oct 30, 2020 at 15:26 UTC
Re^5: CGI::Application - Which is the proper way of handling and outputting utf8 by ikegami (Patriarch) on Dec 15, 2010 at 23:52 UTC
I think the following will do the trick. `sub import { shift; unshift(@_, 'CGI'); goto &CGI::import; }` [download]	[reply] [d/l]


Clear questions and runnable code get the best and fastest answer
	PerlMonks