http://www.perlmonks.org?node_id=1005466


in reply to utf8 characters in Data::Dumper

Perl uses an internal format for strings, and it's not necessarily utf8. So whenever you want to output a non-ASCII string, you must explicitly encode it. I don't think Data::Dumper changes anything about this issue. It will output the string as it is encoded, so you really need to encode it first with an encoding supported by your terminal (and that will probably be utf8).

use strict; use warnings; use Data::Dumper; use utf8; use Encode qw(encode); print Dumper encode 'utf8', "Hiragana Letter あ";

(here I tried to paste the actual hiragana letter あ but it did not work out well inside the code markup.)

IIRC the utf8 pragma only tells Perl that you intend to use a utf8 encoded file as a source for your program. It does not say anything about ouput to stdout.

Replies are listed 'Best First'.
Re^2: utf8 characters in Data::Dumper
by tobyink (Canon) on Nov 25, 2012 at 08:14 UTC

    This is an interesting workaround. It "works" because it prevents Data::Dumper from seeing the hiragana letter at all. Instead it sees three separate bytes: 0xE3, 0x81, 0x82 and outputs them separately. The terminal then reads those bytes and, assuming it's set to display UTF-8, reassembles them into a single hiragana character.

    It breaks down if you set $Data::Dumper::Useqq to true.

    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re^2: utf8 characters in Data::Dumper
by soonix (Canon) on Jan 24, 2020 at 11:18 UTC
    (here I tried to paste the actual hiragana letter あ but it did not work out well inside the code markup.)
    You could use charnames, like this:
    print Dumper encode 'utf8', "Hiragana Letter \N{HIRAGANA LETTER A}";
    This works with PerlMonks, and if you use a web interface to your version control system, it will look nicer there than the "¿" or whatever gets displayed there.

    If your Perl is older than v5.16, it needs an explicit use charnames; to work

Re^2: utf8 characters in Data::Dumper
by remiah (Hermit) on Nov 25, 2012 at 09:01 UTC

    Thanks for reply. grondilu

    I don't think Data::Dumper changes anything about this issue.

    I see. This outputs Hiragana Letter A.

    perl -MData::Dumper -Mutf8 -MEncode=encode -e 'print Dumper encode("UTF-8","Hiragana Letter A ...")'
    $VAR1 = 'Hiragana Letter A ...';
    
    I have been carelessly used Data::Dumper so far, cause it has been very handy for me. And now I think of overriding "Dumper" function and ... feels not good.

    case 1: Anyway, override Dumper.

    use Data::Dumper; use Encode::Deep; no strict 'refs'; no warnings 'redefine'; local *Dumper = sub { print "in Dumper override\n"; return Data::Dumper::Dumper( Encode::Deep::encode('UTF-8',@_) +); }; use strict; use warnings; use utf8; print Dumper "Hiragana Letter A ...";
    Maybe, I should not do this.

    case 2: sub classing(?), Data::Dumper.
    I would like to make wrapper class like this.

    use strict; use warnings; { package MyDumper; use Any::Moose; use Encode::Deep; sub Dumper{ my $self=shift; return Data::Dumper::Dumper( Encode::Deep::encode('UTF-8',@_) +); } 1; } use utf8; my $d=MyDumper->new(); print $d->Dumper("Hiragana Letter A ...");
    I could not "extends" Data::Dumper, cause it's new() function forced to have 2 args ... (I should learn Data::Dumper more...).

    Or, Do you have any other module, that cares for encoding?