In the following example, all combinations of two different inputs and output methods are shown. The Arabic string comes in as a sequence of bytes without Perl knowing it should be UTF-8. The French one, on the other hand, is proper UTF-8 (thanks to
use utf8; and saving the source as UTF-8). When writing bytes to the output not trying to interpret the bytes, we get the "correct" solution. Similarly for UTF-8 string and UTF-8 output. The other two combinations are wrong.
#!/usr/bin/perl
use warnings;
use strict;
use utf8;
my %topurls = (arabic => { title => join(q(), map chr $_,
216, 167, 217, 132, 216, 185,
+ 216,
177, 216, 168, 217, 138, 216,
+ 169),
count => 42,
users => 11,
},
french => { title => 'une chèvre goûte des légumes',
count => 11,
users => 42,
}
);
open my $OUT, '>', 'topurls.htm' or die "Can't open output: $!";
print $OUT <<'END_HEADER';
<html>
<head>
<title>Top URLs</title>
<meta http-equiv="Content-Type" content="text/html; ch
+arset=utf-8">
</head>
<body>
<h3>Top URLs</h3>
<table cellpadding=10 border=1><tr><th>Link</th><th>Co
+unt</th><th>Users</th></tr>
END_HEADER
for my $u (keys %topurls) {
my @line;
$line[0] = '<a target="_blank" href="'.$u.'">'.$topurls{$u}{title}
+.'</a>';
$line[1] = $topurls{$u}{count};
$line[2] = $topurls{$u}{users};
binmode $OUT, ':bytes';
print $OUT '<tr><td>Bytes: ', join('</td><td>', @line), "</td></tr
+>\n";
binmode $OUT, ':utf8';
print $OUT '<tr><td>UTF-8: ', join('</td><td>', @line), "</td></tr
+>\n";
}
print $OUT '</table></body></html>';
close $OUT;
Now you just have to find out what kind of input you have.