Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^2: Confusing UTF-8 bug in CGI-script

by wanradt (Scribe)
on Feb 01, 2011 at 20:14 UTC ( #885608=note: print w/ replies, xml ) Need Help??


in reply to Re: Confusing UTF-8 bug in CGI-script
in thread Confusing UTF-8 bug in CGI-script

STDIN is used to transfer something that isn't text.

What you mean: isn't text. What else? And how then transfer the text and make perl to understand it is UTF-8 encoded?

Strange thing: i have full site running years in UTF-8, every CGI-script has this "use open ':std' => ':encoding(UTF-8)';" at beginning (pretty much the same init block as in this example above), because without it i just did not get anything to work... Now i copied it to another project, stripped down to skeleton and it does not work anymore... It is too mysterious to me.

Nõnda, WK


Comment on Re^2: Confusing UTF-8 bug in CGI-script
Re^3: Confusing UTF-8 bug in CGI-script
by wanradt (Scribe) on Feb 01, 2011 at 20:27 UTC

    Huh, at least i found the difference between working production code and script here: in production i initialize CGI-object (inside BEGIN-block) before asking use open to decode STDIN. Seems, it is the significant difference. Still i hope, you could explain, why input from STDIN is not text.

    Nõnda, WK
      The CGI object only reads from STDIN during initialisation. Anything you do to STDIN later won't affect the CGI object.
Re^3: Confusing UTF-8 bug in CGI-script
by ikegami (Pope) on Feb 01, 2011 at 20:50 UTC

    What you mean: isn't text.

    The text you typed into your browser is transformed by it as follows:

    1. It is encoded using the proper character encoding.
    2. Some of it is encoded using percent encoding.
    3. The resulting string is joined to others to form a application/x-www-urlencoded document.

    That leaves you something that's no longer your text. The proper inverse of that is:

    1. Split the form data into its components.
    2. Remove any percent encoding.
    3. Remove the character encoding.

    You're adding an additional step:

    1. Remove the character encoding. (XXX)
    2. Split the form data into its components.
    3. Remove any percent encoding.
    4. Remove the character encoding.

    The fourth step notices something is odd and throws an error.

    And how then transfer the text and make perl to understand it is UTF-8 encoded?

    That's what the «-utf8» in «use CGI qw(:all -utf8);» does. "This makes CGI.pm treat all parameters as UTF-8 strings" by passing them to decode.

      That's what the «-utf8» in «use CGI qw(:all -utf8);» does.

      And without "-utf8" i can't use CGI properly because then are UTF-8 encoded GET-paramaters treated wrong? I mean, STDIN i can fix with "use open ..."

      I'm gonna finally get to somewhere... Thank you!

      Nõnda, WK

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://885608]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2014-07-28 08:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (193 votes), past polls