Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

special characters in parsed json rendering badly in browser

by slugger415 (Monk)
on Sep 01, 2018 at 20:30 UTC ( #1221554=perlquestion: print w/replies, xml ) Need Help??

slugger415 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a simple JSON query (to Facebook events) that contains smart quotes and other special characters that render badly in the browser. I've tried utf8::encode on the relevant field but that seems to wipe it out completely (it returns nothing).

code snippet:

use JSON qw( decode_json ); use LWP::Simple; use utf8; use strict; my($url) = "https://graph.facebook.com/$id?access_token=" . $token; my($json) = get($url); my($decoded) = decode_json($json); $event{'desc'} = utf8::decode($decoded->{'description'});

Without the utf8::decode I get all the description but with bad character rendering, e.g. "Action Films" (with smart quotes) looks like:

“Action filmsâ€

But again, the utf8::decode seems to wipe it out.

thanks, Scott

Replies are listed 'Best First'.
Re: special characters in parsed json rendering badly in browser
by tinita (Parson) on Sep 02, 2018 at 09:21 UTC

      Result of Dump:

      SV = PV(0xcdc0b8) at 0x2522f38 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x3f44048 "Saturday, September 22, 2018\n7:30pm Doors / 8pm Per +formances\n $15 Guests / $10 for members: .....etc.

      and at the end:

      CUR = 3552 LEN = 3554

      thanks

Re: special characters in parsed json rendering badly in browser
by Corion (Pope) on Sep 03, 2018 at 08:02 UTC

    According to JSON, JSON::decode_json expects raw octets. Also, it returns already UTF-8 decoded content, so your additional decode step should not be necessary.

    Have you looked at the octets you download and have you verified that your problem is not in the further treatment or output of the data?

    Personally, I find it helpful to look at hexdumps of the octets to verify that the proper data is written to the console.

    I see that nowhere in the example code you binmode STDOUT, <c>':encoding(UTF-8)', maybe that would be a good step?

Re: special characters in parsed json rendering badly in browser
by Anonymous Monk on Sep 02, 2018 at 03:18 UTC
    Try decoding it with Encode:
    use Encode; $event{'desc'} = Encode::decode(utf8 => $decoded->{'description'});
    You mention a browser so be sure to specify the charset:
    Content-type: text/html; charset=utf-8

      result of Encode:

      Wide character at C:/Strawberry/perl/lib/Encode.pm line 228.

      And the script stops there.

      charset in Firefox Windows iso-8859-1

      not sure where that gets specified

        Is the browser showing the contents of a file or the response from a web server ?

        poj
Re: special characters in parsed json rendering badly in browser
by slugger415 (Monk) on Sep 02, 2018 at 19:52 UTC

    One thing I might have mentioned is the json returns a kind of encoding I'm not familiar with, for example these appear to be smart quotes:

    what \x{2018}film about film\x{2019} or \x{2018}film as film\x{2019} might mean.

    I'm guessing the JSON module is decoding those somehow in a way I don't want.

      \x refers to hex encoding, so \x{2018} is a two-byte encoding of chr(32) and chr(30) (32 decimal = 20 hex). So it's one of the 16 bit character encodings, though from this alone I am unsure which. See replies for more info.
        so \x{2018} is a two-byte encoding of chr(32) and chr(30)

        Sorry, but no, this is incorrect. \x{2018} is not valid JavaScript, so I'm assuming it's Perl. In Perl, \x{2018} is interpreted as the Unicode character U+2018, LEFT SINGLE QUOTATION MARK, decimal 8216. Although one normally shouldn't have to worry about the internal encoding Perl uses, here it is anyway:

        $ perl -CSD -MDevel::Peek -le 'my $x="\x{2018}"; Dump($x); print "<$x>"'
        SV = PV(0x2147e80) at 0x21675b0
          REFCNT = 1
          FLAGS = (POK,IsCOW,pPOK,UTF8)
          PV = 0x21694e0 "\342\200\230"\0 [UTF8 "\x{2018}"]
          CUR = 3
          LEN = 10
          COW_REFCNT = 1
        <‘>
        

        Minor fixes.

Re: special characters in parsed json rendering badly in browser
by slugger415 (Monk) on Sep 09, 2018 at 13:55 UTC

    Thank you all for your suggestions but I don't feel I'm any closer with this problem, partly my own fault for not being clear what I want. (Which i'm figuring out as I puzzle it out.) I'd like the text/data to be portable to other applications that can read HTML.

    I did figure out (thanks poj) that I can set the encoding in the HTML page to have it display correctly in the browser. But if that text gets posted to another page or app I don't necessarily have control over its page encoding.

    SO what I really want is to HTML encode those characters. But when I try HTML::Encode: for, say, the smart apostrophe:

    use HTML::Entities; my($text) = "Kren’s 89th birthday"; print encode_entities($text), $/;

    that one character gets converted to three HTML entities:

    Kren&acirc;&#128;&#153;s 89th birthday

    which displays a lot of garbage in the browser.

    So I'm lost as to what's going on or how to resolve it. Perl is rendering the JSON string as a smart quote but HTML::Encode is improperly encoding it.

    So far my best solution seems to be:

    $event{'desc'} =~ s/’/\&\#39\;/g; $event{'desc'} =~ s/–/-/g; $event{'desc'} =~ s/—/ - /g; $event{'desc'} =~ s/‘/'/g; $event{'desc'} =~ s/'/'/g; $event{'desc'} =~ s/“/"/g; $event{'desc'} =~ s/”/"/g;

    but of course that only handles characters I'm aware of.

    Thoughts? Thanks for your patience.

    Scott

      see utf8 - The use utf8 pragma tells the Perl parser to allow UTF-8 in the program text in the current lexical scope.

      use utf8; use HTML::Entities; my($text) = "Kren’s 89th birthday"; print encode_entities($text), $/; # result right single quote # Kren&rsquo;s 89th birthday
      poj

        That's it! That's all I needed, thank you!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1221554]
Approved by stevieb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2019-11-21 14:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Strict and warnings: which comes first?



    Results (104 votes). Check out past polls.

    Notices?