http://www.perlmonks.org?node_id=976956

ron.savage has asked for the wisdom of the Perl Monks concerning the following question:

1) Versions: Postgres V 8.4.12. Perl V 5.14.2. DBI V 1.622. DBD::Pg V 2.19.2. DBIx::Class V 0.08196. JSON::XS V 2.32. JSON::Syck not tried. Plack V 0.9988. YUI V 3.5.1.

2) Postgres database creation command line:

psql=# create database novels owner ron encoding 'UTF8';

3) CSV file (606 lines, 1 sample):

"author","category","title","rating","comment","isbn","publisher","p +ublication_date","review_date" "Colm Tóibín","Novel","The South","***","-","-","Picador","-","2012- +06-12"

4) Importing from that CSV file and exporting to Postgres:

use feature 'unicode_strings'; use open qw/:std :utf8/; ... # DBIx::Class: my($rs) = $schema -> resultset('Author'); my($result); for (sort keys %$data) { $result = $rs -> create({name => $_, upper_name => uc $_}); }
Note: Encode qw/decode encode/ not used.

5) Postgres search command line:

novels=# select * from authors where name like 'Colm%'; id | name | upper_name -----+-------------+------------- 100 | Colm Tóibín | COLM TóIBíN (1 row)
So far, so good.

6) Perl command line test script to read db:

use feature qw/say unicode_strings/; use open qw/:std :utf8/; use Encode qw/decode encode/; ... my($row) = $sth -> fetchall_hashref('id'); my($name) = $$row{100}{name}; my($decode) = decode('utf8', $name); my($json) = JSON::XS -> new -> utf8(0) -> encode({name => $decode} +); say "name: $name."; say "decode: $decode."; say "json: $json.";

7) Output of (6):

ron@zigzag:~/perl.modules/Local-Novels$ perl scripts/test.utf8.pl name: Colm Tóibín. decode: Colm Tóibín. json: {"name":"Colm Tóibín"}.
So far, so good.

8) Conclusion (after trying many combinations :-(): I need JSON's utf8(0) and encode() acting on the decoded database field to get the expected JSON output: json: {"name":"Colm Tóibín"}.

9) But, if I use decode('utf8', ...) and utf8(0) for an AJAX call under Plack here's what happens. The HTML page contains:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" / +>

Now for the Perl:

use feature 'unicode_strings'; use Encode 'decode'; ... while (my $item = $rs -> next) { push @$result, { author_name => decode('utf8', $item -> author -> name), ... } } ... $output = {results => $result}; return JSON::XS -> new -> utf8(0) -> encode($output);

Plack reports: Body must be bytes and should not contain wide characters (UTF-8 strings) at...

If I keep Plack happy with:

return JSON::XS -> new -> utf8(1) -> encode($output);
The displayed value for author's name is: Colm Tóibín

What to do?

Replies are listed 'Best First'.
Re: utf8/yui/json/ajax/plack troubles
by Corion (Patriarch) on Jun 19, 2012 at 06:39 UTC

    There are two points to (easily) check whether you have a wrong encoding there:

    • What headers does the JSON HTTP request return? Check this with wget, curl or LWP GET.
    • What does the browser think the page is in, and what does it think about the JSON data?

    If you're returning JSON mixed with HTML, make sure that the HTML, headers and JSON all have the same encoding.

      Hi Thanx for the suggestion about examining the headers. After setting the charset in the Content-Type header before returning the JSON, it worked. Phew! Cheers Ron
Re: utf8/yui/json/ajax/plack troubles
by Anonymous Monk on Jun 19, 2012 at 07:13 UTC

    Consider this :)(error message, ignore the choice of bytes)

    $ perl -MEncode -MJSON -e " print JSON->new->utf8(0)->pretty(1)->encod +e([ decode 'utf8', qq{\xFA\xFG} ]); " Wide character in print at -e line 1. [ "&#8745;&#9488;&#9564;\u000fG" ] $ perl -MEncode -MJSON -e " print JSON->new->ascii(1)->pretty(1)->enco +de([ decode 'utf8', qq{\xFA\xFG} ]); " [ "\ufffd\u000fG" ]

    The error Body must be bytes and should not contain wide characters (UTF-8 strings) comes from Plack::Middleware::Lint, and I remember a similar message from LWP/HTTP::Message. I've learned from perlunitut: Unicode in Perl#I/O flow (the actual 5 minute tutorial) that this means you have to write

    return encode 'UTF-8', JSON->...

    I write JSON because JSON loads JSON::XS if its available, but the code doesn't break of JSON::XS isn't available for some reason :)

    This is confirmed by PSGI::FAQ#I want to send Unicode content in the HTTP response. How can I do so?

    I'm wouldn't be surprised if there exists a Plack::Middleware::Encoding or some such which automatically encodes your UTF-8 strings and adds a charset header, but I haven't seen one

Re: utf8/yui/json/ajax/plack troubles
by Your Mother (Archbishop) on Jun 20, 2012 at 02:39 UTC

    I didn’t read for comprehension but noticed this–

       100 | Colm Tóibín | COLM TóIBíN

    –upon which you remarked, so far so good. It’s wrong of course. It should read TÓIBÍN and would if your strings were properly decoded prior to the uc operation. So regardless of anything else that’s going wrong, the population of your $data is broken.

    use warnings; use strict; use Encode; use open qw( :std :utf8 ); my $start = "Colm Tóibín"; print decode("UTF-8", uc $start), $/; print uc decode("UTF-8", $start), $/; __END__ COLM TóIBíN COLM TÓIBÍN
      Hi Hehehe. I know. But that wasn't the question. Thanx anyway. Cheers Ron