In the following example, all combinations of two different inputs and output methods are shown. The Arabic string comes in as a sequence of bytes without Perl knowing it should be UTF-8. The French one, on the other hand, is proper UTF-8 (thanks to
use utf8; and saving the source as UTF-8). When writing bytes to the output not trying to interpret the bytes, we get the "correct" solution. Similarly for UTF-8 string and UTF-8 output. The other two combinations are wrong.
#!/usr/bin/perl
use warnings;
use strict;
use utf8;
my %topurls = (arabic => { title => join(q(), map chr $_,
216, 167, 217, 132, 216, 185,
+ 216,
177, 216, 168, 217, 138, 216,
+ 169),
count => 42,
users => 11,
},
french => { title => 'une chèvre goûte des légumes',
count => 11,
users => 42,
}
);
open my $OUT, '>', 'topurls.htm' or die "Can't open output: $!";
print $OUT <<'END_HEADER';
<html>
<head>
<title>Top URLs</title>
<meta http-equiv="Content-Type" content="text/html; ch
+arset=utf-8">
</head>
<body>
<h3>Top URLs</h3>
<table cellpadding=10 border=1><tr><th>Link</th><th>Co
+unt</th><th>Users</th></tr>
END_HEADER
for my $u (keys %topurls) {
my @line;
$line[0] = '<a target="_blank" href="'.$u.'">'.$topurls{$u}{title}
+.'</a>';
$line[1] = $topurls{$u}{count};
$line[2] = $topurls{$u}{users};
binmode $OUT, ':bytes';
print $OUT '<tr><td>Bytes: ', join('</td><td>', @line), "</td></tr
+>\n";
binmode $OUT, ':utf8';
print $OUT '<tr><td>UTF-8: ', join('</td><td>', @line), "</td></tr
+>\n";
}
print $OUT '</table></body></html>';
close $OUT;
Now you just have to find out what kind of input you have.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
|
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.