Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: JSON::XS and unicode

by chip (Curate)
on Sep 08, 2012 at 20:57 UTC ( #992529=note: print w/replies, xml ) Need Help??

in reply to JSON::XS and unicode

As is usual with Mark Lehmann's documentation, you must read it very carefully and take it literally. Note that while Perl's test for Unicode bit is named "is_utf8()", the JSON::XS meaning of "utf8" is more correct -- it uses the term to refer to *bytes* that follow the UTF-8 encoding rules. And it's on by default.

Try turning it *off*. And, not to be snarky, but read the docs very carefully. You just have to.

    -- Chip Salzenberg, Free-Floating Agent of Chaos

Replies are listed 'Best First'.
Re^2: JSON::XS and unicode
by kimmel (Beadle) on Sep 08, 2012 at 23:01 UTC

    Re-reading JSON::XS there was one key word I missed the first time around. Here is the relevant POD with my emphasis added.

    $perl_scalar = decode_json $json_text

    The opposite of encode_json: expects an UTF-8 (binary) string and tries to parse that as an UTF-8 encoded JSON text, returning the resulting reference. Croaks on error.
    So I need to encode before passing it to decode_json(). Here is the working program
    #!/usr/bin/perl use v5.14; use warnings; use utf8::all; use Encode; use JSON::XS qw( decode_json ); my $wl = '{"creche": "crèche", "¥": "£", "₡": "волн" }'; my $pattern_list = decode_json( encode("utf8", $wl) );
Re^2: JSON::XS and unicode
by remiah (Hermit) on Sep 08, 2012 at 22:45 UTC

    There seems to be 2 problems.

    One is JSON::XS expects 'encoded utf8' string as default, as you point out.
    Second is utf8::all doesn't affect Slurp's io layer. When OP output to file and read it with Slurp's read_file, it is 'encoded utf8', not 'decoded utf8'. So the second example seems to succeed at a glance.

    use strict; use warnings; use JSON::XS qw( decode_json ); use Data::Dumper; binmode(STDOUT,":encoding(UTF-8)"); sub _p{return pack('U',$_[0])}; my ($wl,$pattern_list); #create utf8 decoded(perl internal utf8) JSON character. $wl = '{"creche": "cr'._p(0xE8).'che",'; $wl.= '"'._p(0xA5).'" : "'._p(0xA3).'",'; $wl.= '"'._p(8353).'": "'._p(1074)._p(1086)._p(1083)._p(1085).'"'; $wl.= '}'; #example 1 of OP sub ex1 { my $pattern_list; #$pattern_list = decode_json($wl); #Wide character in subroutine e +ntry #$pattern_list = JSON::XS->new->utf8(1)->decode($wl);#Wide charact +er in subroutine ent #no warning: it seems this module expects encoded utf8 but decoded + utf8 by default $pattern_list = JSON::XS->new->utf8(0)->decode($wl); } #ex1(); #print Dumper $pattern_list; #example 2 sub ex2 { use File::Slurp qw( read_file ); use utf8::all; open my $fh, '>:encoding(UTF-8)', 'test_file2'; print {$fh} $wl; close $fh; #here utf8::all failed to set Slurp's io layer my $buffer= read_file('test_file2'); print utf8::is_utf8($buffer) ? "buffer:utf8 flagged\n" : "buffer:n +ot utf8 flagged\n"; #you get 'encoded utf8 bytes and that is default for JSON::XS $pattern_list = decode_json( $buffer); #pattern_list is encoded utf8 string, not decoded print utf8::is_utf8($pattern_list) ? "pattern:utf8 flagged\n" : "p +attern:not utf8 flag } ex2(); print Dumper $pattern_list;
    JSON::XS's utf8 seems to me very different from other modules like DBD:: modules, Template's binmode option.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://992529]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2018-05-21 09:57 GMT
Find Nodes?
    Voting Booth?