Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: utf8 characters in Data::Dumper

by grondilu (Pilgrim)
on Nov 25, 2012 at 06:54 UTC ( #1005466=note: print w/ replies, xml ) Need Help??


in reply to utf8 characters in Data::Dumper

Perl uses an internal format for strings, and it's not necessarily utf8. So whenever you want to output a non-ASCII string, you must explicitly encode it. I don't think Data::Dumper changes anything about this issue. It will output the string as it is encoded, so you really need to encode it first with an encoding supported by your terminal (and that will probably be utf8).

use strict; use warnings; use Data::Dumper; use utf8; use Encode qw(encode); print Dumper encode 'utf8', "Hiragana Letter あ";

(here I tried to paste the actual hiragana letter あ but it did not work out well inside the code markup.)

IIRC the utf8 pragma only tells Perl that you intend to use a utf8 encoded file as a source for your program. It does not say anything about ouput to stdout.


Comment on Re: utf8 characters in Data::Dumper
Download Code
Re^2: utf8 characters in Data::Dumper
by tobyink (Abbot) on Nov 25, 2012 at 08:14 UTC

    This is an interesting workaround. It "works" because it prevents Data::Dumper from seeing the hiragana letter at all. Instead it sees three separate bytes: 0xE3, 0x81, 0x82 and outputs them separately. The terminal then reads those bytes and, assuming it's set to display UTF-8, reassembles them into a single hiragana character.

    It breaks down if you set $Data::Dumper::Useqq to true.

    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re^2: utf8 characters in Data::Dumper
by remiah (Hermit) on Nov 25, 2012 at 09:01 UTC

    Thanks for reply. grondilu

    I don't think Data::Dumper changes anything about this issue.

    I see. This outputs Hiragana Letter A.

    perl -MData::Dumper -Mutf8 -MEncode=encode -e 'print Dumper encode("UTF-8","Hiragana Letter A ...")'
    $VAR1 = 'Hiragana Letter A ...';
    
    I have been carelessly used Data::Dumper so far, cause it has been very handy for me. And now I think of overriding "Dumper" function and ... feels not good.

    case 1: Anyway, override Dumper.

    use Data::Dumper; use Encode::Deep; no strict 'refs'; no warnings 'redefine'; local *Dumper = sub { print "in Dumper override\n"; return Data::Dumper::Dumper( Encode::Deep::encode('UTF-8',@_) +); }; use strict; use warnings; use utf8; print Dumper "Hiragana Letter A ...";
    Maybe, I should not do this.

    case 2: sub classing(?), Data::Dumper.
    I would like to make wrapper class like this.

    use strict; use warnings; { package MyDumper; use Any::Moose; use Encode::Deep; sub Dumper{ my $self=shift; return Data::Dumper::Dumper( Encode::Deep::encode('UTF-8',@_) +); } 1; } use utf8; my $d=MyDumper->new(); print $d->Dumper("Hiragana Letter A ...");
    I could not "extends" Data::Dumper, cause it's new() function forced to have 2 args ... (I should learn Data::Dumper more...).

    Or, Do you have any other module, that cares for encoding?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1005466]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2014-09-16 07:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (157 votes), past polls