Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

HTML Form > PERL Script in UTF-8 issue

by Jishanator (Initiate)
on Aug 29, 2012 at 19:29 UTC ( #990536=perlquestion: print w/ replies, xml ) Need Help??
Jishanator has asked for the wisdom of the Perl Monks concerning the following question:

Hey Monks, I've searched far and wide and I climbed the mountain of knowledge for assistance.

I have an HTML Form (saved in UTF8 encoding, french text), once the users input their information, the form post is accept-char: UTF8 and executes a PERL script. This script outputs the same variables from the previous FORM and asks for confirmation.

I've declared binmode UTF8 which allows the self-printing HTML text to print out all the accents etc. This all works fine until a field in the form contains an accent/special character (like the name for example).

The PERL script prints out content fine, but when it prints out a variable that came from the FORM itself, the accents become jumbled characters.

I'm not exactly sure why this is happening or how to solve it.

Comment on HTML Form > PERL Script in UTF-8 issue
Re: HTML Form > PERL Script in UTF-8 issue
by choroba (Abbot) on Aug 29, 2012 at 19:32 UTC
    How do you read the form parameters in your script?
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        Do you use CGI? Dancer? Catalyst? Do you set encoding in the header? Meta tag?

        Please, help us to help you.

        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: HTML Form > PERL Script in UTF-8 issue
by remiah (Hermit) on Aug 29, 2012 at 20:30 UTC

    It seems like you have to decode your inputs, but

    
                        input
    
                          +
      +-------------------|--------------------------+
      | io layer          | to perl internal utf8    | <--+HERE maybe you don't have
      +-------------------|------------------------->|
      |                   v                          |
      |                                              |
      |                                              |
      |        charcter semantics of Perl            |
      |                                              |
      |                                              |
      |                   +                          |
      +-------------------|-------------------------->
      | io layer          | to encoded utf8          |  <--+binmode STDOUT, "encoding(UTF-8)";
      +-------------------|--------------------------+
                          |
                          v
    
                        output
    
    , as choroba says, it is important to know how you are doing it. Could you show us small example that depict it ?

      Your graph was very helpful, visualizing it we were able to figure out that the problem was underlying with the decoding of the variable.

      $input{'formfieldname'} = Encode::decode('UTF-8', $input{'formfieldname'});

      Fixed it right up. Thank you.

        use CGI qw( -utf8 );

        Will do this automatically for you; the caveat being it will cause problems with file uploads. Otherwise it's quite handy.

        You fixed it , or you didn't fix it. I would recommend to check the version of CGI.pm with command like this.

        perl -MCGI -e 'print $CGI::VERSION'

        From Character Encodings in Perl, which moritz sometimes cites for unicode issue.

        Special care must be taken when reading POST or GET parameters with the function param in the module CGI. Older versions (prior to 3.29) always returned byte strings, newer version return text strings if charset("UTF-8") has been called before, and byte strings otherwise. 
        
        So, it is possible that you use older version of CGI.pm. And one day someone upgrade CGI.pm, you will be in trouble.

        Sometimes, older module has different behavior for decoding. I stumbled with older LWP::Simple and these troubles really embarrass me.

      HTML Form = Saved as UTF-8 File. Post > form.pl

      In a required file, I set binmode (STDOUT, "utf8"); Then from the form.pl, I call a require that goes through a foreach loop that does this:

      #!/usr/bin/perl use Encode; foreach $field (@fields) { $input{$field} = Encode::decode('UTF-8', $input{$field}); }

      From form.pl, information is printed out to the user for them to confirm, when submitted, we move to form2.pl.

      form2.pl outputs everything into an external file.. which then concludes to success.pl which pulls the information from the external file and displays it one last time in a sort of "receipt" manner, for them to print or save.

      So to summarize the process, HTML Form > form.pl (converts through foreach to utf8) > form2.pl > success.pl (converts through foreach again). If for example, form.pl doesn't convert, the information is all different in success. I'm afraid this might bite me in the ass in a later stage.</p

        HTML Form = Saved as UTF-8 File. Post > form.pl

        In a required file, I set binmode (STDOUT, "utf8"); Then from the form.pl, I call a require that goes through a foreach loop that does this:

        #!/usr/bin/perl use Encode; foreach $field (@fields) { $input{$field} = Encode::decode('UTF-8', $input{$field}); }

        From form.pl, information is printed out to the user for them to confirm, when submitted, we move to form2.pl.

        form2.pl outputs everything into an external file.. which then concludes to success.pl which pulls the information from the external file and displays it one last time in a sort of "receipt" manner, for them to print or save.

        So to summarize the process, HTML Form > form.pl (converts through foreach to utf8) > form2.pl > success.pl (converts through foreach again). If for example, form.pl doesn't convert, the information is all different in success. I'm afraid this might bite me in the ass in a later stage.

        I posted this, forgot to login...
Re: HTML Form > PERL Script in UTF-8 issue
by philiprbrenan (Monk) on Aug 29, 2012 at 20:35 UTC

    I think you will find it much easier if you convert all your non latin text to &#xnnnn; format which your browser will understand, yet uses only standard Latin characters.

    sub htmlX($) # Convert to Html &#xnnnn; notation {my ($s) = @_; my $t = ''; $t .= (ord($_) < 128 ? $_ : sprintf("&#x%04x;", ord($_))) for split( +//, $s); $t }

    You might also find this web page helpful: http://www.rishida.net/tools/conversion/

      CGI.pm will gladly do that proper if you specify a charset

Re: HTML Form > PERL Script in UTF-8 issue
by Anonymous Monk on Aug 29, 2012 at 22:22 UTC

    I'm not exactly sure why this is happening or how to solve it.

    Show some code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://990536]
Approved by blue_cowdawg
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (8)
As of 2014-08-28 09:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (259 votes), past polls