Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Problem with url escaping in Pod::Simple::HTML and UTF-8

by xenu (Novice)
on Jun 26, 2011 at 10:14 UTC ( [id://911437]=perlquestion: print w/replies, xml ) Need Help??

xenu has asked for the wisdom of the Perl Monks concerning the following question:

Hi! In the following code there's problem with URL escaping:

#!/usr/bin/perl use strict; use warnings; use utf8; use Pod::Simple::HTML my $pod_html_output; my $content = <<CONTENT; =head1 foobar L<<utf-8 letter (a with ogonek) which i can't paste here because perlm +onks doesn't support unicode>>> =cut CONTENT my $pod_html = Pod::Simple::HTML->new; $pod_html->output_string(\$pod_html_output); $pod_html->parse_string_document($content); print $pod_html_output;

'<utf-8 letter (a with ogonek) which i can't paste here because perlmonks doesn't support unicode>' is escaped to '%25105' instead of '%C4%85' I have tried overriding general_url_escape() but then i got even stranger results:

{ package Foobar::Pod::Simple::HTML; use base qw(Pod::Simple::HTML); use CGI::Util qw(escape); sub general_url_escape { my($self, $string) = @_; $string = escape($string); return $string; } }

In this case '<utf-8 letter (a with ogonek) which i can't paste here because perlmonks doesn't support unicode>' was escaped to '%105' (wtf?). Thanks in advance for help.

Replies are listed 'Best First'.
Re: Problem with url escaping in Pod::Simple::HTML and UTF-8
by choroba (Cardinal) on Jun 26, 2011 at 11:11 UTC
    A semicolon is missing after use Pod::Simple::HTML.
    I also get this warning when running your (fixed) code:
    Malformed UTF-8 character (unexpected continuation byte 0xbe, with no +preceding start byte) at 911435.perl line 10.

    Update: This warning remains even if I only keep this:
    #!/usr/bin/perl use strict; use warnings; use utf8; my $content = <<CONTENT; =head1 foobar L<<utf-8 letter (a with ogonek) which i can't paste here because perlm +onks doesn't support unicode>>> =cut CONTENT
Re: Problem with url escaping in Pod::Simple::HTML and UTF-8
by Anonymous Monk on Jun 26, 2011 at 11:41 UTC
    utf-8 letter (a with ogonek) which i can't paste here because perlmonks doesn't support unicode

    :) Perl is perl :) So U+0104 Ą Latin Capital Letter A with ogonek

    print <<"__UTF__"; \N{U+0104} \x{0104} __UTF__ ## this needs ## use charnames ':full'; ## \N{Latin Capital Letter A with ogonek}
Re: Problem with url escaping in Pod::Simple::HTML and UTF-8
by Anonymous Monk on Jun 26, 2011 at 12:34 UTC
    (wtf?). Thanks in advance for help.

    I can confirm this, there is like ~3 bugs here.

    Adding

    =encoding utf8
    makes ::HTML die with
    Cannot decode string with wide characters at C:/perl/5.12.2/lib/MSWin3 +2-x86-multi-thread/Encode.pm line 176.

    Pod::Simple::XHTML generates

    <p><a href="http://search.cpan.org/perldoc?&amp;#x104;">&#x104;</a></p +>
    Both ::HTML and ::XHTML generate
    <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1 +" />
    regardless of BOM or
    =encoding utf8
    which doesn't kill ::XHTML

    This is probably related Bug #24820 for Pod-Simple: Pod encoding, unicode issues

    You should report bug upstream :)

Re: Problem with url escaping in Pod::Simple::HTML and UTF-8
by xenu (Novice) on Jun 26, 2011 at 14:56 UTC
    Problem solved by Rhomboid@reddit here.

      If you are crossposting, please at least have the courtesy to say so in your original post. I consider it rude to crosspost questions without telling people where else you have asked the question.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://911437]
Approved by moritz
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (4)
As of 2024-03-29 14:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found