http://www.perlmonks.org?node_id=1005452

remiah has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks.

When I print variables with Data::Dumper, it prints utf-8 characters with escaped sequence.

>perl -MData::Dumper -Mutf8 -e 'print Dumper "Hiragana Letter A .. for example ..."'
$VAR1 = "\x{3042}";
Is there a way to dump utf8 characters as is? I was thinking of this several times, and looked for option like $Data::Dumper::Encoding='UTF-8', but I could not find it so far.

So here I would like to ask monks for suggestions, wisdoms

regards.

Replies are listed 'Best First'.
Re: utf8 characters in Data::Dumper
by tobyink (Canon) on Nov 25, 2012 at 08:07 UTC

    It's a limitation of Data::Dumper. Try using a different dumping module, such as Data::Printer.

    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

      Thanks tobyink.

      I would like to check Data::Printer. "filter" function seems nice to me.
      I was going to look for other modules like FreezeThaw or YAML.

Re: utf8 characters in Data::Dumper
by grondilu (Friar) on Nov 25, 2012 at 06:54 UTC

    Perl uses an internal format for strings, and it's not necessarily utf8. So whenever you want to output a non-ASCII string, you must explicitly encode it. I don't think Data::Dumper changes anything about this issue. It will output the string as it is encoded, so you really need to encode it first with an encoding supported by your terminal (and that will probably be utf8).

    use strict; use warnings; use Data::Dumper; use utf8; use Encode qw(encode); print Dumper encode 'utf8', "Hiragana Letter あ";

    (here I tried to paste the actual hiragana letter あ but it did not work out well inside the code markup.)

    IIRC the utf8 pragma only tells Perl that you intend to use a utf8 encoded file as a source for your program. It does not say anything about ouput to stdout.

      This is an interesting workaround. It "works" because it prevents Data::Dumper from seeing the hiragana letter at all. Instead it sees three separate bytes: 0xE3, 0x81, 0x82 and outputs them separately. The terminal then reads those bytes and, assuming it's set to display UTF-8, reassembles them into a single hiragana character.

      It breaks down if you set $Data::Dumper::Useqq to true.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
      (here I tried to paste the actual hiragana letter あ but it did not work out well inside the code markup.)
      You could use charnames, like this:
      print Dumper encode 'utf8', "Hiragana Letter \N{HIRAGANA LETTER A}";
      This works with PerlMonks, and if you use a web interface to your version control system, it will look nicer there than the "¿" or whatever gets displayed there.

      If your Perl is older than v5.16, it needs an explicit use charnames; to work

      Thanks for reply. grondilu

      I don't think Data::Dumper changes anything about this issue.

      I see. This outputs Hiragana Letter A.

      perl -MData::Dumper -Mutf8 -MEncode=encode -e 'print Dumper encode("UTF-8","Hiragana Letter A ...")'
      $VAR1 = 'Hiragana Letter A ...';
      
      I have been carelessly used Data::Dumper so far, cause it has been very handy for me. And now I think of overriding "Dumper" function and ... feels not good.

      case 1: Anyway, override Dumper.

      use Data::Dumper; use Encode::Deep; no strict 'refs'; no warnings 'redefine'; local *Dumper = sub { print "in Dumper override\n"; return Data::Dumper::Dumper( Encode::Deep::encode('UTF-8',@_) +); }; use strict; use warnings; use utf8; print Dumper "Hiragana Letter A ...";
      Maybe, I should not do this.

      case 2: sub classing(?), Data::Dumper.
      I would like to make wrapper class like this.

      use strict; use warnings; { package MyDumper; use Any::Moose; use Encode::Deep; sub Dumper{ my $self=shift; return Data::Dumper::Dumper( Encode::Deep::encode('UTF-8',@_) +); } 1; } use utf8; my $d=MyDumper->new(); print $d->Dumper("Hiragana Letter A ...");
      I could not "extends" Data::Dumper, cause it's new() function forced to have 2 args ... (I should learn Data::Dumper more...).

      Or, Do you have any other module, that cares for encoding?

Re: utf8 characters in Data::Dumper
by Anonymous Monk on Nov 25, 2012 at 13:56 UTC

    I'd use JSON, its very close to perl in syntax, and its unicode by default

    YAML is as well, but the syntax is harder :)

Re: utf8 characters in Data::Dumper
by leszekdubiel (Scribe) on Jan 24, 2020 at 09:11 UTC

    Popular question, no good solutions found... My simple is:

    print Dumper(%mydata) =~ s/\\x\{([0-9a-f]{2,})\}/chr hex $1/ger;

      That's buggy. Consider the situation where someone has the string \x{0A}, such as the content of this post.