Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

( PDF::EasyPDF ) encoding problem

by lepetitalbert (Monsignor)
on Sep 07, 2009 at 11:29 UTC ( #793929=perlquestion: print w/replies, xml ) Need Help??
lepetitalbert has asked for the wisdom of the Perl Monks concerning the following question:

Hello Dear Monks,

I have a problem with 'accent-characters' (...) when creating a pdf with PDF::EasyPDF

#!/usr/bin/perl use strict; use warnings; my $string = "pp"; use PDF::EasyPDF; my $pdf = PDF::EasyPDF->new({file=>"test.pdf",x=>mm(210),y=>mm(297)}); $pdf->text(mm(20),mm(297-20), $string ) $pdf->close; print $string

in the pdf the string becomes 'p√p√ '

I tried to add things like :

use utf8; use open ':std' => ':utf8'; utf8::encode($string);

with no luck

As this encoding stuff is not really clear in my mind I don't know where to search

I'm on linux, LANG=fr_CH.UTF-8

Any hint welcome

Have a nice day

"There is only one good, namely knowledge, and only one evil, namely ignorance." Socrates

Replies are listed 'Best First'.
Re: ( PDF::EasyPDF ) encoding problem
by almut (Canon) on Sep 07, 2009 at 12:59 UTC

    From a quick look at the PDF::EasyPDF source, I'd say it doesn't have any support for unicode at all...

    So, try to Encode::encode() your $string in IsoLatin1 (aka "iso-8859-1") before you pass it to the ->text() method.  Also, you'd need use utf8; if you have literal strings in your source code (like in your example) and are using a Unicode editor (which I suppose you are, otherwise you wouldn't be getting the results you're currently seeing...). This is required to tell Perl that the source is in UTF-8.

    (Of course this approach would only work for characters that are actually encodable in IsoLatin1, like ""...)


      You're right moritz, I've already read this article several times but I still cannot say I'm really comfortable with this stuff.

      I thought it was some package limitation, as I usually don't have problems with this ( french speaking area here ), but couldn't confirm it.

      I tried your and almut's solution but no change.

      Thanks and have a nice day.

      "There is only one good, namely knowledge, and only one evil, namely ignorance." Socrates

        I just had a closer look at the module...  Part of the problem is that PDF::EasyPDF specifies the encoding for its 14 Adobe Base Fonts (all it supports) as "MacRomanEncoding" — which is kind of unfortunate, as this encoding is rather different from ISO-8859-1, which Perl defaults to in most cases (for example, "" is 0x8E in Mac Roman, while it's 0xE9 in ISO-Latin-1 and CP1252).  In other words, even if you had successfully solved the UTF-8 to ISO-8859-1 conversion issue, it still wouldn't work...

        But you can get it working with two small changes to  (tested, i.e. works for me):

        1. Replace all 14 occurrences of "/MacRomanEncoding" with "/WinAnsiEncoding" (case is important). Windows ANSI encoding (CP1252) is roughly the equivalent of ISO-8859-1/ISO-8859-15  (one of the differences is that the Euro symbol is in a different position (0xA4 in ISO-8859-15, 0x80 in CP1252).

        2. Change this line

          open (EASYPDF,">$self->{file}") or die ...

          to read

          open (EASYPDF, ">:encoding(cp1252)", $self->{file}) or die ...

          (and don't forget to put use utf8; in your script)

        Update: alternatively, you could leave PDF::EasyPDF's /MacRomanEncoding declarations in place, and have Perl convert to that encoding directly (">:encoding(MacRoman)"), which would work, too (except for the Euro symbol)

Re: ( PDF::EasyPDF ) encoding problem
by moritz (Cardinal) on Sep 07, 2009 at 12:28 UTC
    I tried to add things like :
    use utf8; use open ':std' => ':utf8'; utf8::encode($string);

    with no luck

    Don't just randomly add stuff that is related to the problem, but try to understand how encodings are handled in Perl, and act accordingly.

    If PDF::EasyPDF follows the usually model of receiving decoded text strings, and your strings come from literals in the source file, a simple use utf8: should be enough.

    But this paragraph in the documentation is somewhat discouraging:


    None known, but the methods do relatively little sanity checking, and there is absolutely no encoding yet for text (so it's probably impossible to print parentheses, for example).

    So if you can't even print all ASCII characters, I'd be surprised if it worked reliably for non-ASCII characters.

    Perl 6 - links to (nearly) everything that is Perl 6.
Re: ( PDF::EasyPDF ) encoding problem
by grantm (Parson) on Sep 07, 2009 at 21:42 UTC

    I've never used PDF::EasyPDF but I do know that PDF and Unicode are not inherently friendly. PDF has knowledge of a number of 'built-in' fonts. These font's are all addressed using Latin-1 (or variants of Latin-1 like Mac Roman).

    One implication of this is that if the characters you want to print are included in iso-8859-1 (your example suggests they are) then you might get away with passing PDF::EasyPDF a Latin-1 encoded string rather than Perl's native UTF-8 strings.

    A second implication is that if the characters you want to print are not included in iso-8859-1 then it will be necessary to embed a font in your document. Embedded fonts can be addressed in a way that allows access to non Latin-1 characters.

    The PDF::Reuse module supports embedding TrueType fonts and transparently converting from native Perl strings to a PDF encoding.


      Thanks almut for your debugging and solution. I will try this.

      Thanks grantm for the details.

      I tried PDF::Create and encoding the string as latin1, worked.

      Have a nice day

      "There is only one good, namely knowledge, and only one evil, namely ignorance." Socrates

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://793929]
Approved by Perlbotics
[ambrus]: This is different from MS Word, which was already a good editor in the pre-unicode days (in word for windows versions 2 and 6, which ran on windows 3 but also on windows 95), only it wasn't trying to solve the task of writing maths papers back then.
[Discipulus]: ah ok, sounds reasonable; with no fear: Perl all life long
[ambrus]: Mind you, LaTeX is currently still useful for writing math paper or snippet content without styling in such a way that the
[ambrus]: formatting conventions of a journal or website can be quickly applied to it, and MS Office and LibreOffice has not quite solved this (although it's better for this than it used to be),
[ambrus]: which is sort of a drawback compared to the ages of typewritten manuscripts representing content only to which the typesetter applies formatting, but that process required much more manual labor.
[ambrus]: If you want to typeset a manuscript, you can still do much less work then in the manual typesetting ages and get good formatting.
[ambrus]: All with only cheap modern computers and software.
[ambrus]: Something you can have at home and your corner print shop, without a whole printing press's worth of equipment.

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (10)
As of 2017-09-26 11:14 GMT
Find Nodes?
    Voting Booth?
    During the recent solar eclipse, I:

    Results (293 votes). Check out past polls.