http://www.perlmonks.org?node_id=990555


in reply to HTML Form > PERL Script in UTF-8 issue

It seems like you have to decode your inputs, but


                    input

                      +
  +-------------------|--------------------------+
  | io layer          | to perl internal utf8    | <--+HERE maybe you don't have
  +-------------------|------------------------->|
  |                   v                          |
  |                                              |
  |                                              |
  |        charcter semantics of Perl            |
  |                                              |
  |                                              |
  |                   +                          |
  +-------------------|-------------------------->
  | io layer          | to encoded utf8          |  <--+binmode STDOUT, "encoding(UTF-8)";
  +-------------------|--------------------------+
                      |
                      v

                    output
, as choroba says, it is important to know how you are doing it. Could you show us small example that depict it ?

Replies are listed 'Best First'.
Re^2: HTML Form > PERL Script in UTF-8 issue
by Jishanator (Initiate) on Aug 29, 2012 at 20:39 UTC

    Your graph was very helpful, visualizing it we were able to figure out that the problem was underlying with the decoding of the variable.

    $input{'formfieldname'} = Encode::decode('UTF-8', $input{'formfieldname'});

    Fixed it right up. Thank you.

      use CGI qw( -utf8 );

      Will do this automatically for you; the caveat being it will cause problems with file uploads. Otherwise it's quite handy.

        This did not work... After thorough research, apparently the problem resides within the STDIN, the variables from the FORM element are always raw, therefore never encoded. To resolve; decode the elements to UTF8. I can't seem to find a "global" method of doing so, so calling a loop that flips through the array of FORM elements will seemingly have to be my fix.

      You fixed it , or you didn't fix it. I would recommend to check the version of CGI.pm with command like this.

      perl -MCGI -e 'print $CGI::VERSION'

      From Character Encodings in Perl, which moritz sometimes cites for unicode issue.

      Special care must be taken when reading POST or GET parameters with the function param in the module CGI. Older versions (prior to 3.29) always returned byte strings, newer version return text strings if charset("UTF-8") has been called before, and byte strings otherwise. 
      
      So, it is possible that you use older version of CGI.pm. And one day someone upgrade CGI.pm, you will be in trouble.

      Sometimes, older module has different behavior for decoding. I stumbled with older LWP::Simple and these troubles really embarrass me.

        You got me worried now, I'm not too familiar with these things. I found out I have version 3.55 of CGI.pm. I do have it functioning currently, although I'm not very trusting of it long-term, let alone exactly how it's functioning.
        It's 3.55, I'm not too sure if that's a red flag or not?
Re^2: HTML Form > PERL Script in UTF-8 issue
by Anonymous Monk on Aug 30, 2012 at 20:20 UTC

    HTML Form = Saved as UTF-8 File. Post > form.pl

    In a required file, I set binmode (STDOUT, "utf8"); Then from the form.pl, I call a require that goes through a foreach loop that does this:

    #!/usr/bin/perl use Encode; foreach $field (@fields) { $input{$field} = Encode::decode('UTF-8', $input{$field}); }

    From form.pl, information is printed out to the user for them to confirm, when submitted, we move to form2.pl.

    form2.pl outputs everything into an external file.. which then concludes to success.pl which pulls the information from the external file and displays it one last time in a sort of "receipt" manner, for them to print or save.

    So to summarize the process, HTML Form > form.pl (converts through foreach to utf8) > form2.pl > success.pl (converts through foreach again). If for example, form.pl doesn't convert, the information is all different in success. I'm afraid this might bite me in the ass in a later stage.</p

      HTML Form = Saved as UTF-8 File. Post > form.pl

      In a required file, I set binmode (STDOUT, "utf8"); Then from the form.pl, I call a require that goes through a foreach loop that does this:

      #!/usr/bin/perl use Encode; foreach $field (@fields) { $input{$field} = Encode::decode('UTF-8', $input{$field}); }

      From form.pl, information is printed out to the user for them to confirm, when submitted, we move to form2.pl.

      form2.pl outputs everything into an external file.. which then concludes to success.pl which pulls the information from the external file and displays it one last time in a sort of "receipt" manner, for them to print or save.

      So to summarize the process, HTML Form > form.pl (converts through foreach to utf8) > form2.pl > success.pl (converts through foreach again). If for example, form.pl doesn't convert, the information is all different in success. I'm afraid this might bite me in the ass in a later stage.

      I posted this, forgot to login...