Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
I then run the usual on it ... It should be simple!

There is a big assumption out there that this stuff is so easy that you can bypass the standard libraries. This assumption only holds if you know the dozens of related RFCs inside and out and if you do, you are going to lean on someone else's implementation anyway because it will be roughly identical functionally to anything you'd write.

Using either of the standards (CGI, URI::Escape) for this kind of thing would have saved you all that lost time and plenty more in the future.

perl -MURI::Escape -le 'print uri_unescape("%C2%A3")' £ perl -MCGI=param -le 'print param("q")' "q=%C2%A3" £


Eliya rightly points out that I was missing the point. So, here's a bit more answer instead of knee-jerk, use the CPAN. I am assuming the output is meant for web, though this isnít actually stated in the OP.

Plack is necessary for this but makes it super easy to try stuff soĖ

Plain uri_unescape, and therefore the original code snippet, is fine if you are sending the output, bytes that are utf-8, not Perl decoded strings. The response is fine because itís undecoded bytes.

plackup -e 'use URI::Escape; sub { [200, ["Content-Type" => "text/html +; charset=utf-8"], [ uri_unescape("%C2%A3") ]]}' HTTP::Server::PSGI: Accepting connections at http://0:5000/ -- £

Now with decoding to Perlís utf-8. It doesnít work because the output needs to be encoded to bytes and youíll generally get errors or warnings to that effect.

plackup -e 'use Encode; use URI::Escape; sub { [200, ["Content-Type" = +> "text/html; charset=utf-8"], [ decode("UTF-8", uri_unescape("%C2%A3 +")) ]]}' HTTP::Server::PSGI: Accepting connections at http://0:5000/ -- Error: Body must be bytes and should not contain wide characters (UTF- +8 strings) at /usr/local/lib/perl5/site_perl/5.14.0/Plack/Middleware/ line 153

Now double encoded just to see because it seems to crop up a lot when mixing approaches.

plackup -e 'use Encode; use URI::Escape; sub { [200, ["Content-Type" = +> "text/html; charset=utf-8"], [ encode("UTF-8", uri_unescape("%C2%A3 +")) ]]}' HTTP::Server::PSGI: Accepting connections at http://0:5000/ -- £

And improved/corrected versions of the CGI example. Using the -utf8 arg CGI will automatically decode things for you. This is what you want so you can deal with content correctly in regular expressions and such. Itís your responsibility to make sure the output handle is UTF-8 or that you encode to bytes. The character is right in Perl here but wrong for the output layer.

perl -MCGI=param,-utf8 -le 'print param("q")' "q=%C2%A3" ?

Using -CO to get utf-8 on the output layer it works fine.

perl -CO -MCGI=param,-utf8 -le 'print param("q")' "q=%C2%A3" £

Or, encoding the utf-8 to bytes.

perl -MEncode -MCGI=param,-utf8 -le 'print encode("UTF-8", param("q")) +' "q=%C2%A3" £

Anyway, the first answers in the thread were, taken together, all quite thorough. This was just to have a little to play with and recant my grumpy and erroneous first stab.

In reply to Re: UTF8 URI Escaping by Your Mother
in thread UTF8 URI Escaping by snoopy20

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others romping around the Monastery: (6)
    As of 2018-01-24 06:46 GMT
    Find Nodes?
      Voting Booth?
      How did you see in the new year?

      Results (256 votes). Check out past polls.