I think you are confused about encoding. It can be pretty confusing. See perlunitut, "Unicode and Strings" in Modern Perl, The Perl Unicode Cookbook ...
As you know, if you try to print a "wide" unicode character, Perl gives you a warning:
#!/usr/bin/perl
use strict;
use warnings;
use feature qw/ say /;
say "Ferrari 308 \x{1F44D}";
__END__
$ perl 1140714.pl
Wide character in say at 1140714.pl line 6.
Ferrari 308 👍
$
You can fix this as stevieb pointed out below, with binmode:
#!/usr/bin/perl
use strict;
use warnings;
use feature qw/ say /;
binmode STDOUT, ':utf8';
say "Ferrari 308 \x{1F44D}";
__END__
$ perl 1140714.pl
Ferrari 308 👍
$
If you want to use the unicode characters in your Perl code, you can't just expect Perl to know what they are:
#!/usr/bin/perl
use strict;
use warnings;
use feature qw/ say /;
binmode STDOUT, ':utf8';
say "Ferrari 308 👍";
__END__
$ perl 1140714.pl
Ferrari 308
... fix that by useing utf8:
#!/usr/bin/perl
use strict;
use warnings;
use feature qw/ say /;
binmode STDOUT, ':utf8';
use utf8;
say "Ferrari 308 👍";
__END__
$ perl 1140714.pl
Ferrari 308 👍
$
If you are going to read in data that might have unicode characters, eg:
$ cat 1140714.txt
Lotus lan 👍
任意のスーパーカー
$
... you can't expect Perl to know what you're giving it:
#!/usr/bin/perl
use strict;
use warnings;
use feature qw/ say /;
binmode STDOUT, ':utf8';
use utf8;
open my $in, '<', '1140714.txt' or die "open: $!\n";
print for (<$in>);
__END__
$ perl 1140714.pl
Lotus lan
任の
. . . you can fix that by using an I/O layer in your open:
$ cat 1140714.pl
#!/usr/bin/perl
use strict;
use warnings;
use feature qw/ say /;
binmode STDOUT, ':utf8';
use utf8;
open my $in, '< :utf8', '1140714.txt' or die "open: $!\n";
print for (<$in>);
__END__
$ perl 1140714.pl
Lotus lan 👍
任意のスーパーカー
If you print your unicode data to a filehandle you'll get the wide-character warning again:
#!/usr/bin/perl
use strict;
use warnings;
use feature qw/ say /;
binmode STDOUT, ':utf8';
use utf8;
open my $in, '< :utf8', '1140714.txt' or die "open: $!\n";
open my $out, '>', '1140714.out' or die "open: $!\n";
print $out $_ for (<$in>);
close $out or die "close: $!\n";
__END__
$ perl 1140714.pl
Wide character in print at 1140714.pl line 11, <$in> line 2.
Wide character in print at 1140714.pl line 11, <$in> line 2.
$
. . . fix it with an I/O layer:
#!/usr/bin/perl
use strict;
use warnings;
use feature qw/ say /;
binmode STDOUT, ':utf8';
use utf8;
open my $in, '< :utf8', '1140714.txt' or die "open: $!\n";
open my $out, '> :utf8', '1140714.out' or die "open: $!\n";
print $out $_ for (<$in>);
close $out or die "close: $!\n";
__END__
$ perl 1140714.pl
$ cat 1140714.out
Lotus lan 👍
任意のスーパーカー
$
No encoding needed at all.
Hope this helps!
Update: Added examples and links
The way forward always starts with a minimal test.
|