The solution was- Add use utf8;
- add binmode OUT, ":utf8"; after the file open
- Do not use encode() on the values
Thanks for the comebacks everyone. | [reply] [Watch: Dir/Any] [d/l] [select] |
On the second point:
add binmode OUT, ":utf8";after the file open
I think is better, to use binmode OUT, ":encoding(UTF-8)"; because ':encoding(UTF-8)'checks the data for actually being valid UTF-8, while ':utf8' just marks the data as UTF-8 without further checking. Please check binmode.
If you tell me, I'll forget.
If you show me, I'll remember.
if you involve me, I'll understand.
--- Author unknown to me
| [reply] [Watch: Dir/Any] [d/l] [select] |
Without knowing what is in %topurls, we cannot help you much. It seems the hash contains strings in an encoding other than UTF-8. How do you populate %topurls?
| [reply] [Watch: Dir/Any] |
It is populated from a query to a PgSQL DB. I'm sure the chars in the db are UTF-8, in fact I've written them to an Excel sheet using encode() and it worked fine. Also when I replace the loop in the example with
foreach my $u (keys %topurls) {
my @line;
$line[0] = $u;
$line[1] = $topurls{$u}{title};
$line[2] = $topurls{$u}{count};
$line[3] = $topurls{$u}{users};
print join("\t",@line)."\n";
}
It prints correctly in the debugger (Komodo) output window. | [reply] [Watch: Dir/Any] [d/l] |
| [reply] [Watch: Dir/Any] [d/l] |
In the following example, all combinations of two different inputs and output methods are shown. The Arabic string comes in as a sequence of bytes without Perl knowing it should be UTF-8. The French one, on the other hand, is proper UTF-8 (thanks to use utf8; and saving the source as UTF-8). When writing bytes to the output not trying to interpret the bytes, we get the "correct" solution. Similarly for UTF-8 string and UTF-8 output. The other two combinations are wrong.
#!/usr/bin/perl
use warnings;
use strict;
use utf8;
my %topurls = (arabic => { title => join(q(), map chr $_,
216, 167, 217, 132, 216, 185,
+ 216,
177, 216, 168, 217, 138, 216,
+ 169),
count => 42,
users => 11,
},
french => { title => 'une chèvre goûte des légumes',
count => 11,
users => 42,
}
);
open my $OUT, '>', 'topurls.htm' or die "Can't open output: $!";
print $OUT <<'END_HEADER';
<html>
<head>
<title>Top URLs</title>
<meta http-equiv="Content-Type" content="text/html; ch
+arset=utf-8">
</head>
<body>
<h3>Top URLs</h3>
<table cellpadding=10 border=1><tr><th>Link</th><th>Co
+unt</th><th>Users</th></tr>
END_HEADER
for my $u (keys %topurls) {
my @line;
$line[0] = '<a target="_blank" href="'.$u.'">'.$topurls{$u}{title}
+.'</a>';
$line[1] = $topurls{$u}{count};
$line[2] = $topurls{$u}{users};
binmode $OUT, ':bytes';
print $OUT '<tr><td>Bytes: ', join('</td><td>', @line), "</td></tr
+>\n";
binmode $OUT, ':utf8';
print $OUT '<tr><td>UTF-8: ', join('</td><td>', @line), "</td></tr
+>\n";
}
print $OUT '</table></body></html>';
close $OUT;
Now you just have to find out what kind of input you have.
| [reply] [Watch: Dir/Any] [d/l] [select] |