Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

utf8 problems with Pod-ProjectDocs

by LanX (Saint)
on Sep 24, 2012 at 17:37 UTC ( [id://995425]=perlquestion: print w/replies, xml ) Need Help??

LanX has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I'm not sure if it's a bug or if I'm missing something...

My employer uses Pod::ProjectDocs for rendering CPAN like HTML for our modules.

But whatever I do German umlauts in UTF8 are rendered wrongly!

the HTML source shows

<p> <p>ä</p> <p>ö</p> <p>ü</p> </div>
while the POD has
=pod =encoding utf8 ä ö ü =cut
I've checked the meta-tag it shows content="text/html; charset=UTF-8" />

Adding or leaving =encoding utf8 doesn't help, the only result is a warning:

Unrecognized command 'encoding' skipped at /home/rolf/projects/cgi-bin +/GLOBAL/DB/gedit_utf.pm line 1
Any ideas?

Cheers Rolf

Replies are listed 'Best First'.
Re: utf8 problems with Pod-ProjectDocs
by remiah (Hermit) on Sep 25, 2012 at 01:24 UTC

    I couldn't fully trace this, nor understand all. If you are going to have some trial, 4 lines of dirty patch to ProjectDocs::Parse.pm seems to me work.

    1. decode $self->{buffer} to character. Add 2 lines

    use Encode qw/decode/; $self->{buffer}=decode('UTF-8', $self->{buffer});
    just after the line
    $self->{buffer} = join "\n", qq[<div class="pod">], $self->{buffer}, " +</div>";
    ,which is in the "end_pos" sub.

    2. encode $self->{buffer} back. Add 2 lines
    use Encode(qw/encode/); $self->{buffer}=encode('UTF-8', $self->{buffer});
    just after
    $self->SUPER::parse_from_file(@_);
    ,which is in "parse_from_file" sub.

    before SUPER::parse_from_file, it has no troble. after SUPER::parse_from_file, it is troubled. "end_pos" is callback from Pod::Parser.

      Thanks!

      I recommended your patch to the maintainer of our installation. =)

      Cheers Rolf

      I think you got it the wrong way round: encode(...) in end_pod and decode(...) in parse_from_file works for me.

        Hello

        I checked my test at that time, and I find no problems with Pod::ProjectDocs with no patch.
        I wonder whether this is fix of ProjectDocs, or Pod::Parser or my dream ???
        If you post your test case, I would like to check it. Mine was like this, and again, I see no problem now...

        test.pl

        #!/usr/bin/perl -w use strict; use warnings; use utf8; use lib './Pod-ProjectDocs-0.40/lib'; use Pod::ProjectDocs; my $pd = Pod::ProjectDocs->new( outroot => './out', libroot => './lib', title => '&#12503;&#12525;&#12472;&#12455;&#12463;&#12488;', lan =>'UTF-8', ); print "before gen\n"; $pd->gen();
        And target script in "lib"
        #!/usr/bin/perl =pod =head1 For POD TEST (ä) ä ö ü =cut
        regards

Re: utf8 problems with Pod-ProjectDocs
by Anonymous Monk on Sep 25, 2012 at 00:32 UTC

    It is a bug, Pod::ProjectDocs doesn't account for encoding, and it doesn't binmode input files

Re: utf8 problems with Pod-ProjectDocs
by sundialsvc4 (Abbot) on Sep 24, 2012 at 23:05 UTC

    Well, I admit that I might be shooting in the dark here, but ...

    In perldoc perlpod I read the following caveat:

    And don’t forget, when using any command, that the command lasts up until the end of its paragraph, not its line.   So in the examples below, you can see that every command needs the blank line after it, to end its paragraph.

    (... which sure looks to me like one of those “doh-h-h!!” things that I am all too-familiar with.)

    I also notice from the documentation for Pod-ProjectDocs that the charset can also be specified there as a parameter.

    Now, my big question is (and it is one that I don’t know the answer to):   does the UTF-8 character set support the German umlaut?

    Your quotation of the meta-tag ... well, I shall merely assume that there is a closing-quote after text/html.   But, assuming this to be the case, I see that it does specify UTF-8, which rather makes me suspect that this encoding does not contain the character you seek.   And if that be the case, then this is the real reason why the display is turning out wrongly.

      Now, my big question is (and it is one that I don’t know the answer to): does the UTF-8 character set support the German umlaut?

      Sure it does. Unicode contains all characters available in ISO-8859-1, and ISO-8859-1 contains umlauts. I think it is quite safe to assume that Unicode supports nearly all written languages currently in use, and some ancient ones.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://995425]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2024-04-24 18:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found