Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

PgPP invalid byte seqnence for encoding UTF8

by cormanaz (Chaplain)
on Mar 24, 2013 at 14:15 UTC ( #1025145=perlquestion: print w/ replies, xml ) Need Help??
cormanaz has asked for the wisdom of the Perl Monks concerning the following question:

Good day bros. I am trying to retrieve and store some user information from Twitter and store it in a psql db. The data comes back as a json string which I am decoding with mod JSON. Everything goes well until I try to insert the record into a psql db table (that has encoding UTF8) using the PgPP driver.

The string causing the problem is (undecoded in the JSON string returned by Twitter): En d\u00e9mocratie, on a le droit d'avoir tort And this looks fine in Komodo debugger once decoded, i.e. it shows the accented e. When I go to insert I get DBD::PgPP::st execute failed: ERROR:  invalid byte sequence for encoding "UTF8": 0xe96d6f Anyone know what's going wrong here or how to fix?

Comment on PgPP invalid byte seqnence for encoding UTF8
Select or Download Code
Re: PgPP invalid byte seqnence for encoding UTF8
by McA (Deacon) on Mar 24, 2013 at 14:38 UTC

    Hi,

    what does the psql driver expect as input? A UTF-8 encoded string?

    When you do the following:

    #!/usr/bin/perl use warnings; use strict; use Data::Dumper; use JSON; my $json = q({"my": "En d\u00e9mocratie, on a le droit d'avoir tort"}) +; my $perl = decode_json $json; my $flag_utf8 = utf8::is_utf8($perl->{'my'}) ? 1 : 0; print "is unicode: $flag_utf8\n"; print Dumper($perl), "\n";
    you see, that after decoding JSON you get probably strings which are marked as unicode. As soon as you are using "output" of these strings (a database is one of the many output channels) you have to asked which encoding is expected. That means: If psql expects UTF-8 encoding you have to encode the string appropriately with Encode::encode('UTF-8', $string);

    McA

      That did take care of the encoding error tho I don't understand why because I thought Perl was natively UTF-8. Anyway it works.

      But now when I try to use that value to update a record like so:

      my $sth = $dbh->prepare("update userinfo set description=? where uid=$ +uid"); $sth->execute(encode('UTF-8',$userdata->[$i]->{description})) || die $ +sth->errstr;
      I get DBD::PgPP::st execute failed: ERROR:  array value must start with "{" or dimension information at character 33

      This is confusing as the string is En démocratie, on a le droit d'avoir tort and character 33 is either an "a" or a "v" depending on whether it's counting form zero.

        Hi,

        I'm pretty sure it has nothing to do with encoding but with the way the related columns are defined. Please, show use your table definition of userinfo.

        McA

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1025145]
Approved by Ratazong
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2014-07-13 18:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (251 votes), past polls