Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re: Writing HTML file with UTF-8 chars

by choroba (Bishop)
on Apr 27, 2013 at 16:19 UTC ( #1030982=note: print w/replies, xml ) Need Help??

in reply to Writing HTML file with UTF-8 chars

Without knowing what is in %topurls, we cannot help you much. It seems the hash contains strings in an encoding other than UTF-8. How do you populate %topurls?
لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Replies are listed 'Best First'.
Re^2: Writing HTML file with UTF-8 chars
by cormanaz (Chaplain) on Apr 27, 2013 at 16:25 UTC
    It is populated from a query to a PgSQL DB. I'm sure the chars in the db are UTF-8, in fact I've written them to an Excel sheet using encode() and it worked fine. Also when I replace the loop in the example with
    foreach my $u (keys %topurls) { my @line; $line[0] = $u; $line[1] = $topurls{$u}{title}; $line[2] = $topurls{$u}{count}; $line[3] = $topurls{$u}{users}; print join("\t",@line)."\n"; }
    It prints correctly in the debugger (Komodo) output window.
      I'm sure the chars in the db are UTF-8...

      Have you set the pg_enable_utf8 flag in DBD::Pg? Having valid UTF-8 is important, but so is telling Perl which encoding to use to interpret the incoming data.

      In the following example, all combinations of two different inputs and output methods are shown. The Arabic string comes in as a sequence of bytes without Perl knowing it should be UTF-8. The French one, on the other hand, is proper UTF-8 (thanks to use utf8; and saving the source as UTF-8). When writing bytes to the output not trying to interpret the bytes, we get the "correct" solution. Similarly for UTF-8 string and UTF-8 output. The other two combinations are wrong.
      #!/usr/bin/perl use warnings; use strict; use utf8; my %topurls = (arabic => { title => join(q(), map chr $_, 216, 167, 217, 132, 216, 185, + 216, 177, 216, 168, 217, 138, 216, + 169), count => 42, users => 11, }, french => { title => 'une chèvre goûte des légumes', count => 11, users => 42, } ); open my $OUT, '>', 'topurls.htm' or die "Can't open output: $!"; print $OUT <<'END_HEADER'; <html> <head> <title>Top URLs</title> <meta http-equiv="Content-Type" content="text/html; ch +arset=utf-8"> </head> <body> <h3>Top URLs</h3> <table cellpadding=10 border=1><tr><th>Link</th><th>Co +unt</th><th>Users</th></tr> END_HEADER for my $u (keys %topurls) { my @line; $line[0] = '<a target="_blank" href="'.$u.'">'.$topurls{$u}{title} +.'</a>'; $line[1] = $topurls{$u}{count}; $line[2] = $topurls{$u}{users}; binmode $OUT, ':bytes'; print $OUT '<tr><td>Bytes: ', join('</td><td>', @line), "</td></tr +>\n"; binmode $OUT, ':utf8'; print $OUT '<tr><td>UTF-8: ', join('</td><td>', @line), "</td></tr +>\n"; } print $OUT '</table></body></html>'; close $OUT;

      Now you just have to find out what kind of input you have.

      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1030982]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (9)
As of 2018-06-25 16:05 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (127 votes). Check out past polls.