Dear highly esteemed PerlMonks
Update: how do I make the PerlMonks web site show the foreign fonts, instead of the HEX?
I am working on a project which deals with data in foreign languages. My Perl scripts were running fine.
I then wanted to use Tie::File, since this is a neat concept (and saves time and coding).
It seems that Tie:File is failing under Unicode/UTF-8 (unless I am missing something).
Here is a program which depicts the problem: (The data is a mix of English, Greek and Hebrew).
use strict;
use warnings;
use 5.014;
use Win32::Console;
use autodie;
use warnings qw< FATAL utf8 >;
use Carp;
use Carp::Always;
use utf8;
use feature qw< unicode_strings>;
use charnames qw< :full>;
use Tie::File;
my ($i);
my ( $FileName);
my (@Tied);
binmode STDOUT, ':unix:utf8';
binmode STDERR, ':unix:utf8';
binmode $DB::OUT, ':unix:utf8' if $DB::OUT; # for the debugger
Win32::Console::OutputCP(65001); # Set the console code page t
+o UTF8
$FileName = 'E:\\My Documents\\Technical\\Perl\\Eclipse workspace\\FIB
+I OCR\\Work\\'.
'Tie File test res.txt';
tie @Tied, 'Tie::File', $FileName, recsep => "\x0D\x0A", discipline =>
+ ':encoding(utf8)'
or confess 'tie @Tied failed';
$i =0;
while (<DATA>) {
chomp;
$Tied[$i] = $_;
++$i;
} # end while (<DATA>)
$i =0;
foreach (@Tied) {
say "$i $Tied[$i]";
++$i;
} # end foreach (@Tied)
untie $FileName;
__DATA__
τι κάνετε;
πάρτε το ή αφή&
+#963;τε το
שלום חברים
abc לא כןכן efg
מתי ולאן This is it
מעכשיו לעכש
+;יו
Σήμερα είναι &#
+932;ρίτη
Θέλω να φάω
τι κάνετε;
שורה מס' 5
This produces a huge cascade of warnings: here is some:
utf8 "\xCE" does not map to Unicode at F:/Win7programs/Dwimperl/perl/l
+ib/Tie/File.pm line 917
Tie::File::_read_record('Tie::File=HASH(0x24cb72c)') called at
+ F:/Win7programs/Dwimper
l/perl/lib/Tie/File.pm line 175
Tie::File::_fetch('Tie::File=HASH(0x24cb72c)', 0) called at F:
+/Win7programs/Dwimperl/p
erl/lib/Tie/File.pm line 210
Tie::File::STORE('Tie::File=HASH(0x24cb72c)', 0, 'τι
+ κάνετε;') called at tie file test
.pl line 31
utf8 "\xCF" does not map to Unicode at F:/Win7programs/Dwimperl/perl/l
+ib/Tie/File.pm line 917
Tie::File::_read_record('Tie::File=HASH(0x24cb72c)') called at
+ F:/Win7programs/Dwimper
l/perl/lib/Tie/File.pm line 175
Tie::File::_fetch('Tie::File=HASH(0x24cb72c)', 0) called at F:
+/Win7programs/Dwimperl/p
erl/lib/Tie/File.pm line 210
Tie::File::STORE('Tie::File=HASH(0x24cb72c)', 0, 'τι
+ κάνετε;') called at tie file test
.pl line 31
utf8 "\xD7" does not map to Unicode at F:/Win7programs/Dwimperl/perl/l
+ib/Tie/File.pm line 917
Tie::File::_read_record('Tie::File=HASH(0x24cb72c)') called at
+ F:/Win7programs/Dwimper
l/perl/lib/Tie/File.pm line 175
Tie::File::_fetch('Tie::File=HASH(0x24cb72c)', 0) called at F:
+/Win7programs/Dwimperl/p
erl/lib/Tie/File.pm line 210
Tie::File::STORE('Tie::File=HASH(0x24cb72c)', 0, 'τι
+ κάνετε;') called at tie file test
.pl line 31
utf8 "\xD7" does not map to Unicode at F:/Win7programs/Dwimperl/perl/l
+ib/Tie/File.pm line 917
Tie::File::_read_record('Tie::File=HASH(0x24cb72c)') called at
+ F:/Win7programs/Dwimper
l/perl/lib/Tie/File.pm line 175
Tie::File::_fetch('Tie::File=HASH(0x24cb72c)', 0) called at F:
+/Win7programs/Dwimperl/p
erl/lib/Tie/File.pm line 210
Tie::File::STORE('Tie::File=HASH(0x24cb72c)', 0, 'τι
+ κάνετε;') called at tie file test
.pl line 31
Then it prints this on STDOUT:
0 τι κάνετε;
1 πάρτε το ή αφή
+;στε το
2 שלום חברים
3 abc לא כןכן efg
4 מתי ולאן This is it
5 מעכשיו לעכ
+13;יו
6 Σήμερα είναι
+Τρίτη
7 Θέλω να φάω
8 τι κάνετε;
9 שורה מס' 5
10
11
12
13
14 \xA4\xΘέλω\xA8\x
15
16
17
18
19
Note that the first 9 lines are OK, but lines 10 through 19 came from nowhere!?
In addition, the output file contains corrupted data:
τι κάνϏN͏Ŏՠτή
+;στε של חברءbc 
+500;ؗܗࠗܗߠeמתול&
+#1488;ן This is מעיו לע
+99;؎Ďώݎ֏ναι Τρ&#
+920;έώގѠφϏŎ٠κτ&#
+949;;שרה מס'
\xA4\xΘέλω\xA8\x
Something is very wrong here. Either I am missing something, or Tie:File can't cope with Unicode/UTF-8?
I am runnning Strawberry Perl 5.14 on a Windows 7 system.
Many TIA - Helen
Note: cross- posted on http://stackoverflow.com/questions/13209474/