Re: Write special chars to PDF. UTF8?

G'day tel2,

"I'm guessing I might need to "use utf8", ..."

Sorry, but that would be a bad guess. The documentation for the utf8 pragma states, in emboldened text:

"Do not use this pragma for anything else than telling Perl that your script is written in UTF-8."

Your basic problem here is that the filehandle, FILE, doesn't know about the UTF-8. Example of what's happening:

$ perl -Mutf8 -wE 'say "e-acute: é; u-acute: ú"'
e-acute: ?; u-acute: ?
[download]

Here's three ways to address this problem:

Use the binmode function, e.g.

$ perl -Mutf8 -wE 'binmode STDOUT => ":utf8"; say "e-acute: é; u-acute
+: ú"'
e-acute: é; u-acute: ú
[download]

Use the open pragma, e.g.

$ perl -Mutf8 -wE 'use open OUT => qw{:utf8 :std}; say "e-acute: é; u-
+acute: ú"'
e-acute: é; u-acute: ú
[download]

Use the 3-argument form of the open function and specify the encoding in the mode. Something like this:
```
open my $fh, '>:encoding(UTF-8)', $filename
[download]
```

Here's some recommendations for your code. This is unrelated to the UTF-8 issue.

Let Perl tell you about problems. Start using the strict pragma and the warnings pragma.
Your code is littered with package variables: $cgi, an object reference; $f1, a string; FILE, a filehandle; and so on. These are all global and suffer from the same problems as all global variables. Start using lexical variables, and control their scope, for far less error-prone code. There's a lot of information about this in perlsub; the "Private Variables via my()" section would be a good place to start.
Don't use indirect object syntax, e.g. code like new CGI. Here's what perlobj: Invoking Class Methods says, in emboldened text, at the start of the Indirect Object Syntax section:
"Outside of the file handle case, use of this syntax is discouraged as it can confuse the Perl interpreter. See below for more details."
Start using lexical filehandles with the 3-argument form of the open function. See that document for more about this.
Hand-crafting I/O die messages is time-consuming and error-prone. Let Perl do this task for you with the autodie pragma. You can then write code like this:
```
use autodie;
...
open my $in_fh, '<', $infile;
open my $out_fh, '>', $outfile;
[download]
```

— Ken

Comment on Re: Write special chars to PDF. UTF8? Select or Download Code

Replies are listed 'Best First'.
Re^2: Write special chars to PDF. UTF8? by tel2 (Pilgrim) on Feb 12, 2016 at 23:24 UTC
G'day from across the ditch, Ken. You're talkin' my language, mate. Thanks very much for your time and all your tips. The reason I wrote $f1 to the file and read it back into $f2 was just to make sure the variables weren't changing in the process, and from what I can tell they aren't. I'm struggling to understand how this issue is about writing/reading the file. My reasons are: 1. If I remove my "quick hack" and change the webpage's "Output: $f1" line to "Output: $f2" (which it was meant to be originally - sorry), the e-acute appears on the webpage correctly. 2. If I print $f1 (which has not been read from a file) to the PDF (e.g. $text->text("PDF Output:$f1=$f2");) no acutes appear correctly. 3. If I write $f1 to a file as you have suggested, and read it back into $f3, it then contains more bytes than $f1, and printing $f3 to the PDF still doesn't print e-acute properly. Below is some modified code which demonstrates this (sorry, I haven't brought it into the general coding standards you've suggested at this stage). #!/usr/bin/perl use lib "/home/tospeirs/perl5/lib/perl5"; use CGI; use PDF::API2; use bytes; use constant mm => 25.4 / 72; $cgi = new CGI; $f1 = $cgi->param(f1); if (defined($f1)) { open (FILE, ">utf8_test1.out") or die "Can't open outfile"; print FILE $f1; close FILE; open (FILE, "<utf8_test1.out") or die "Can't open infile"; $f2 = <FILE>; close FILE; open my $fh, '>:encoding(UTF-8)', 'utf8_test2.out'; print $fh $f1; close $fh; open my $fh, '<:encoding(UTF-8)', 'utf8_test2.out'; $f3 = <$fh>; close $fh; $lengths = "Lengths: f1=" . bytes::length($f1) . ", f2=" . byt +es::length($f2) . ", f3=" . bytes::length($f3); $cmp = ($f1 eq $f2) ? 'f1=f2' : 'f1<>f2'; $cmp .= ($f1 eq $f3) ? ', f1=f3' : ', f1<>f3'; $pdf = PDF::API2->new(); $font1 = $pdf->corefont('Arial'); $page = $pdf->page; # Add blank page $page->mediabox(210/mm, 297/mm); $text = $page->text(); $text->font($font1, 28); $text->translate(5/mm ,280/mm); # A quick hack to handle a couple of special chars #$f2 =~ s/\303\251/\351/g; # e-acute #$f2 =~ s/\303\272/\372/g; # u-acute $text->text("PDF Output:$f1=$f2=$f3"); $pdf->saveas('utf8_test1.pdf'); } print <<EOF; Content-Type: text/html; charset=utf-8\n <!DOCTYPE html> <html lang='en-NZ'> <head> <title>Test UTF-8</title> <meta charset='UTF-8'> </head> <body> <form method='post'> Input: <input type='text' name='f1' value='$f1'> <br> <input type='submit' name='submit' value='Submit'> <br> Output f2: $f2 <br> Output f3: $f3 <br> $lengths <br> $cmp </form> </body> </html> EOF [download] This is what I see on the webpage after I submit "Cliché.": Input: Cliché. Submit Output f2: Cliché. Output f3: ClichÃ©. Lengths: f1=8, f2=8, f3=10 f1=f2, f1<>f3 And the PDF ends up containing this: PDF Output:ClichÃ©.=ClichÃ©.=ClichÃƒÂ©. As you can see, none of those 3 came out right in the PDF, and the $f3 looks extra long, as if it's been double-encoded or something. Check this octal dump out: $ od -c utf8_test1.out 0000000 C l i c h 303 251 . $ od -c utf8_test2.out 0000000 C l i c h 303 203 302 251 . Any ideas? Thanks. tel2	[reply] [d/l]
Re^3: Write special chars to PDF. UTF8? by poj (Abbot) on Feb 13, 2016 at 09:44 UTC
Try using `decode()` for the pdf #!/perl use strict; use warnings; use CGI; use CGI::Carp 'fatalsToBrowser'; use PDF::API2; use Encode; my $cgi = new CGI; my $f1 = $cgi->param('f1'); my $f2 = decode('UTF-8', $f1 ); open OUT,'>','c:/temp/web/pdf.txt' or die; # change path to suit print OUT "$f1 $f2"; close OUT; my $pdf = PDF::API2->new()->mediabox('A4'); my $text = $pdf->page->text; my $font1 = $pdf->corefont('Arial'); $text->font($font1, 36); $text->translate(100,500); $text->text("f1 = $f1"); $text->translate(100,600); $text->text("f2 = $f2"); $pdf->saveas('c:/temp/web/utf8_test1.pdf'); # change path to suit print <<EOF; Content-Type: text/html; charset=UTF-8\n <!DOCTYPE html> <html lang='en-NZ'> <head> <title>Test UTF-8</title> <meta charset="UTF-8"> </head><body> $f1 $f2 <form method="post"> Input: <input type="text" name="f1" value="$f1"><br> <input type="submit" name="submit" value="Submit"> </form></body></html> EOF [download] poj	[reply] [d/l] [select]
Re^4: Write special chars to PDF. UTF8? by tel2 (Pilgrim) on Feb 14, 2016 at 23:18 UTC
Thank you very much for that code, poj! That's working for me. tel2	[reply]


laziness, impatience, and hubris
	PerlMonks