Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

utf8 characters in Data::Dumper

by remiah (Hermit)
on Nov 25, 2012 at 01:59 UTC ( #1005452=perlquestion: print w/ replies, xml ) Need Help??
remiah has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks.

When I print variables with Data::Dumper, it prints utf-8 characters with escaped sequence.

>perl -MData::Dumper -Mutf8 -e 'print Dumper "Hiragana Letter A .. for example ..."'
$VAR1 = "\x{3042}";
Is there a way to dump utf8 characters as is? I was thinking of this several times, and looked for option like $Data::Dumper::Encoding='UTF-8', but I could not find it so far.

So here I would like to ask monks for suggestions, wisdoms

regards.

Comment on utf8 characters in Data::Dumper
Re: utf8 characters in Data::Dumper
by grondilu (Pilgrim) on Nov 25, 2012 at 06:54 UTC

    Perl uses an internal format for strings, and it's not necessarily utf8. So whenever you want to output a non-ASCII string, you must explicitly encode it. I don't think Data::Dumper changes anything about this issue. It will output the string as it is encoded, so you really need to encode it first with an encoding supported by your terminal (and that will probably be utf8).

    use strict; use warnings; use Data::Dumper; use utf8; use Encode qw(encode); print Dumper encode 'utf8', "Hiragana Letter あ";

    (here I tried to paste the actual hiragana letter あ but it did not work out well inside the code markup.)

    IIRC the utf8 pragma only tells Perl that you intend to use a utf8 encoded file as a source for your program. It does not say anything about ouput to stdout.

      This is an interesting workaround. It "works" because it prevents Data::Dumper from seeing the hiragana letter at all. Instead it sees three separate bytes: 0xE3, 0x81, 0x82 and outputs them separately. The terminal then reads those bytes and, assuming it's set to display UTF-8, reassembles them into a single hiragana character.

      It breaks down if you set $Data::Dumper::Useqq to true.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

      Thanks for reply. grondilu

      I don't think Data::Dumper changes anything about this issue.

      I see. This outputs Hiragana Letter A.

      perl -MData::Dumper -Mutf8 -MEncode=encode -e 'print Dumper encode("UTF-8","Hiragana Letter A ...")'
      $VAR1 = 'Hiragana Letter A ...';
      
      I have been carelessly used Data::Dumper so far, cause it has been very handy for me. And now I think of overriding "Dumper" function and ... feels not good.

      case 1: Anyway, override Dumper.

      use Data::Dumper; use Encode::Deep; no strict 'refs'; no warnings 'redefine'; local *Dumper = sub { print "in Dumper override\n"; return Data::Dumper::Dumper( Encode::Deep::encode('UTF-8',@_) +); }; use strict; use warnings; use utf8; print Dumper "Hiragana Letter A ...";
      Maybe, I should not do this.

      case 2: sub classing(?), Data::Dumper.
      I would like to make wrapper class like this.

      use strict; use warnings; { package MyDumper; use Any::Moose; use Encode::Deep; sub Dumper{ my $self=shift; return Data::Dumper::Dumper( Encode::Deep::encode('UTF-8',@_) +); } 1; } use utf8; my $d=MyDumper->new(); print $d->Dumper("Hiragana Letter A ...");
      I could not "extends" Data::Dumper, cause it's new() function forced to have 2 args ... (I should learn Data::Dumper more...).

      Or, Do you have any other module, that cares for encoding?

Re: utf8 characters in Data::Dumper
by tobyink (Abbot) on Nov 25, 2012 at 08:07 UTC

    It's a limitation of Data::Dumper. Try using a different dumping module, such as Data::Printer.

    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

      Thanks tobyink.

      I would like to check Data::Printer. "filter" function seems nice to me.
      I was going to look for other modules like FreezeThaw or YAML.

Re: utf8 characters in Data::Dumper
by Anonymous Monk on Nov 25, 2012 at 13:56 UTC

    I'd use JSON, its very close to perl in syntax, and its unicode by default

    YAML is as well, but the syntax is harder :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1005452]
Approved by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2014-09-16 05:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (157 votes), past polls