Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

( PDF::EasyPDF ) encoding problem

by lepetitalbert (Monsignor)
on Sep 07, 2009 at 11:29 UTC ( #793929=perlquestion: print w/ replies, xml ) Need Help??
lepetitalbert has asked for the wisdom of the Perl Monks concerning the following question:

Hello Dear Monks,

I have a problem with 'accent-characters' (...) when creating a pdf with PDF::EasyPDF

#!/usr/bin/perl use strict; use warnings; my $string = "pp"; use PDF::EasyPDF; my $pdf = PDF::EasyPDF->new({file=>"test.pdf",x=>mm(210),y=>mm(297)}); $pdf->text(mm(20),mm(297-20), $string ) $pdf->close; print $string

in the pdf the string becomes 'p√p√ '

I tried to add things like :

use utf8; use open ':std' => ':utf8'; utf8::encode($string);

with no luck

As this encoding stuff is not really clear in my mind I don't know where to search

I'm on linux, LANG=fr_CH.UTF-8

Any hint welcome

Have a nice day

"There is only one good, namely knowledge, and only one evil, namely ignorance." Socrates

Comment on ( PDF::EasyPDF ) encoding problem
Select or Download Code
Re: ( PDF::EasyPDF ) encoding problem
by moritz (Cardinal) on Sep 07, 2009 at 12:28 UTC
    I tried to add things like :
    use utf8; use open ':std' => ':utf8'; utf8::encode($string);

    with no luck

    Don't just randomly add stuff that is related to the problem, but try to understand how encodings are handled in Perl, and act accordingly.

    If PDF::EasyPDF follows the usually model of receiving decoded text strings, and your strings come from literals in the source file, a simple use utf8: should be enough.

    But this paragraph in the documentation is somewhat discouraging:

    BUGS

    None known, but the methods do relatively little sanity checking, and there is absolutely no encoding yet for text (so it's probably impossible to print parentheses, for example).

    So if you can't even print all ASCII characters, I'd be surprised if it worked reliably for non-ASCII characters.

    Perl 6 - links to (nearly) everything that is Perl 6.
Re: ( PDF::EasyPDF ) encoding problem
by almut (Canon) on Sep 07, 2009 at 12:59 UTC

    From a quick look at the PDF::EasyPDF source, I'd say it doesn't have any support for unicode at all...

    So, try to Encode::encode() your $string in IsoLatin1 (aka "iso-8859-1") before you pass it to the ->text() method.  Also, you'd need use utf8; if you have literal strings in your source code (like in your example) and are using a Unicode editor (which I suppose you are, otherwise you wouldn't be getting the results you're currently seeing...). This is required to tell Perl that the source is in UTF-8.

    (Of course this approach would only work for characters that are actually encodable in IsoLatin1, like ""...)

      hi,

      You're right moritz, I've already read this article several times but I still cannot say I'm really comfortable with this stuff.

      I thought it was some package limitation, as I usually don't have problems with this ( french speaking area here ), but couldn't confirm it.

      I tried your and almut's solution but no change.

      Thanks and have a nice day.

      "There is only one good, namely knowledge, and only one evil, namely ignorance." Socrates

        I just had a closer look at the module...  Part of the problem is that PDF::EasyPDF specifies the encoding for its 14 Adobe Base Fonts (all it supports) as "MacRomanEncoding" — which is kind of unfortunate, as this encoding is rather different from ISO-8859-1, which Perl defaults to in most cases (for example, "" is 0x8E in Mac Roman, while it's 0xE9 in ISO-Latin-1 and CP1252).  In other words, even if you had successfully solved the UTF-8 to ISO-8859-1 conversion issue, it still wouldn't work...

        But you can get it working with two small changes to EasyPDF.pm  (tested, i.e. works for me):

        1. Replace all 14 occurrences of "/MacRomanEncoding" with "/WinAnsiEncoding" (case is important). Windows ANSI encoding (CP1252) is roughly the equivalent of ISO-8859-1/ISO-8859-15  (one of the differences is that the Euro symbol is in a different position (0xA4 in ISO-8859-15, 0x80 in CP1252).

        2. Change this line

          open (EASYPDF,">$self->{file}") or die ...

          to read

          open (EASYPDF, ">:encoding(cp1252)", $self->{file}) or die ...

          (and don't forget to put use utf8; in your script)

        Update: alternatively, you could leave PDF::EasyPDF's /MacRomanEncoding declarations in place, and have Perl convert to that encoding directly (">:encoding(MacRoman)"), which would work, too (except for the Euro symbol)

Re: ( PDF::EasyPDF ) encoding problem
by grantm (Parson) on Sep 07, 2009 at 21:42 UTC

    I've never used PDF::EasyPDF but I do know that PDF and Unicode are not inherently friendly. PDF has knowledge of a number of 'built-in' fonts. These font's are all addressed using Latin-1 (or variants of Latin-1 like Mac Roman).

    One implication of this is that if the characters you want to print are included in iso-8859-1 (your example suggests they are) then you might get away with passing PDF::EasyPDF a Latin-1 encoded string rather than Perl's native UTF-8 strings.

    A second implication is that if the characters you want to print are not included in iso-8859-1 then it will be necessary to embed a font in your document. Embedded fonts can be addressed in a way that allows access to non Latin-1 characters.

    The PDF::Reuse module supports embedding TrueType fonts and transparently converting from native Perl strings to a PDF encoding.

      Hi,

      Thanks almut for your debugging and solution. I will try this.

      Thanks grantm for the details.

      I tried PDF::Create and encoding the string as latin1, worked.

      Have a nice day

      "There is only one good, namely knowledge, and only one evil, namely ignorance." Socrates

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://793929]
Approved by Perlbotics
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2014-09-16 10:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (8 votes), past polls