Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: JSON::XS and unicode

by chip (Curate)
on Sep 08, 2012 at 20:57 UTC ( #992529=note: print w/ replies, xml ) Need Help??


in reply to JSON::XS and unicode

As is usual with Mark Lehmann's documentation, you must read it very carefully and take it literally. Note that while Perl's test for Unicode bit is named "is_utf8()", the JSON::XS meaning of "utf8" is more correct -- it uses the term to refer to *bytes* that follow the UTF-8 encoding rules. And it's on by default.

Try turning it *off*. And, not to be snarky, but read the docs very carefully. You just have to.

    -- Chip Salzenberg, Free-Floating Agent of Chaos


Comment on Re: JSON::XS and unicode
Re^2: JSON::XS and unicode
by remiah (Hermit) on Sep 08, 2012 at 22:45 UTC

    Hello.
    There seems to be 2 problems.

    One is JSON::XS expects 'encoded utf8' string as default, as you point out.
    Second is utf8::all doesn't affect Slurp's io layer. When OP output to file and read it with Slurp's read_file, it is 'encoded utf8', not 'decoded utf8'. So the second example seems to succeed at a glance.

    use strict; use warnings; use JSON::XS qw( decode_json ); use Data::Dumper; binmode(STDOUT,":encoding(UTF-8)"); sub _p{return pack('U',$_[0])}; my ($wl,$pattern_list); #create utf8 decoded(perl internal utf8) JSON character. $wl = '{"creche": "cr'._p(0xE8).'che",'; $wl.= '"'._p(0xA5).'" : "'._p(0xA3).'",'; $wl.= '"'._p(8353).'": "'._p(1074)._p(1086)._p(1083)._p(1085).'"'; $wl.= '}'; #example 1 of OP sub ex1 { my $pattern_list; #$pattern_list = decode_json($wl); #Wide character in subroutine e +ntry #$pattern_list = JSON::XS->new->utf8(1)->decode($wl);#Wide charact +er in subroutine ent #no warning: it seems this module expects encoded utf8 but decoded + utf8 by default $pattern_list = JSON::XS->new->utf8(0)->decode($wl); } #ex1(); #print Dumper $pattern_list; #example 2 sub ex2 { use File::Slurp qw( read_file ); use utf8::all; open my $fh, '>:encoding(UTF-8)', 'test_file2'; print {$fh} $wl; close $fh; #here utf8::all failed to set Slurp's io layer my $buffer= read_file('test_file2'); print utf8::is_utf8($buffer) ? "buffer:utf8 flagged\n" : "buffer:n +ot utf8 flagged\n"; #you get 'encoded utf8 bytes and that is default for JSON::XS $pattern_list = decode_json( $buffer); #pattern_list is encoded utf8 string, not decoded print utf8::is_utf8($pattern_list) ? "pattern:utf8 flagged\n" : "p +attern:not utf8 flag } ex2(); print Dumper $pattern_list;
    JSON::XS's utf8 seems to me very different from other modules like DBD:: modules, Template's binmode option.

Re^2: JSON::XS and unicode
by kimmel (Beadle) on Sep 08, 2012 at 23:01 UTC

    Re-reading JSON::XS there was one key word I missed the first time around. Here is the relevant POD with my emphasis added.

    $perl_scalar = decode_json $json_text

    The opposite of encode_json: expects an UTF-8 (binary) string and tries to parse that as an UTF-8 encoded JSON text, returning the resulting reference. Croaks on error.
    So I need to encode before passing it to decode_json(). Here is the working program
    #!/usr/bin/perl use v5.14; use warnings; use utf8::all; use Encode; use JSON::XS qw( decode_json ); my $wl = '{"creche": "crèche", "¥": "£", "₡": "волн" }'; my $pattern_list = decode_json( encode("utf8", $wl) );

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://992529]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (9)
As of 2015-07-06 09:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (71 votes), past polls