http://www.perlmonks.org?node_id=1011485

ev0lution has asked for the wisdom of the Perl Monks concerning the following question:

I'm pretty stumped. I'm pulling from a MySQL database using DBI, however I'm getting junk when I dump it to an array. Here's the code:

my $dbh = DBI->connect("DBI:mysql:$db_name1:$db_host1:$db_port1","$db_ +user1","$db_pw1", {RaiseError => 1, PrintError => 0, mysql_enable_utf +8 => 1, ChopBlanks =>0}); $log->debug("Running SQL Select statement: $sqlselect" . "\n"); my $RowSet = $dbh->selectall_arrayref("SELECT * FROM CollectionDB.ISI +S WHERE SourceID = '$PassedSource' ORDER BY ObjectKey DESC");

Now, after that I set each element of the array to a set of strings, then run data_string_desc on them. This is the result:

artTitle is ?????? ?? ??? ???? ???? ???? Encoding test on Title results: UTF8 off, ASCII, 28 characters 28 byte +s

I'm not sure how, even with UTF-8 enabled while DBI is connecting to the DB and UTF-8 set for every configuration on the database itself, the string is still appearing as ASCII. I need the UTF-8 because a majority of the data I'm bringing in is multibit (Chinese, Arabic, etc..)

Thanks in advance for any insight you can provide!

Replies are listed 'Best First'.
Re: Encoding issue from DBI to string
by marto (Cardinal) on Jan 03, 2013 at 16:55 UTC

      To expand on what marto said, learn how to create SQL with bound variables

      my $rowSet = $dbh->selectall_arrayref("SELECT * FROM CollectionDB.ISIS + WHERE SourceID = ? ORDER BY ObjectKey DESC", {}, $PassedSource);

      A Monk aims to give answers to those who have none, and to learn from those who know more.
        Thanks for the heads up. Not well versed in MySQL, nor Perl for that matter. Just sort of had all of this dumped on my plate and learning as I'm going.
Re: Encoding issue from DBI to string
by karlgoethebier (Abbot) on Jan 03, 2013 at 17:06 UTC

    Did you set try to set...

    default-character-set=utf8 default-collation=utf8_unicode_ci

    ...in your my.cnf and restart mysqld?

    OK, i'm guessing again. Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

Re: Encoding issue from DBI to string
by sundialsvc4 (Abbot) on Jan 03, 2013 at 18:04 UTC

    The use of placeholders is absolutely vital when using SQL, especially in a web-site where the inputs can never be fully trusted.   The placeholders, an un-quoted question mark (therefore, not a literal-string), represent places where consecutive values from a separately-supplied list will be inserted.   In this way, it is impossible for any of the data to be misinterpreted (that is, “injected”) as part of the SQL statement itself.   Nor can there ever be any question about where each value begins or ends.   The data-type of each parameter in the list is independently known and does not have to be, nor converted to, a character-string.   (At least, not by you.)   Furthermore, this might (or might not) also solve your character-encoding concern:   the SQL string is one “thing,” and each parameter is another, such that never the twain shall meet.   It’s very clean and black-box-ish:   “here’s the SQL string, and here’s a rag-tag bag of parameters I want to use with it; now, go do it.”

      "The use of placeholders is absolutely vital when using SQL..."

      Nothing but the truth. But perhaps this is yet another communication problem.

      I'm just wondering about what will happen when i try to force mysql_enable_utf8 => 1 in my DBI call when default-character-set=utf8 and default-collation=utf8_unicode_ci is not set in my.cnf...

      For the moment i don't have any mySQL at the hand to check this.

      Best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

        Even after adding in the default-collation=utf8_unicode_ci, I'm still getting all question marks for multibyte characters. What a headache.

        Not sure if anyone who have the answer here, but assuming the DB is set up appropriate with all UTF-8 encoding and data appearing valid in the tables, it wouldn't really matter how it's getting into the database to begin with, right?

        We're using a 3rd party program as a scraper, and its underlying Java is dumping the data to the DB. I haven't looked into it much just because the data appears right in the DB with all UTF-8 encoding configured, so I assumed it wasn't the issue.

        Hey Karl, Upon looking in the my.cnf file, I had default-character-set = utf8 and collation-server = utf8_unicode_ci, however default-collation was not set. I made the necessary changes, but had to step into a meeting. I'll take a look at how things go in a bit.
Re: Encoding issue from DBI to string
by graff (Chancellor) on Jan 04, 2013 at 01:08 UTC
    If you haven't tried this yet, see what happens if you change the first arg in your DBI->connect() call from this:
    "DBI:mysql:$db_name1:$db_host1:$db_port1"
    to this :
    "DBI:mysql:$db_name1:$db_host1:$db_port1;mysql_enable_utf8=1"
      Gave it a shot, but still no luck. :-\
Re: Encoding issue from DBI to string
by space_monk (Chaplain) on Jan 04, 2013 at 15:00 UTC

    What version of Perl are you using? Also, if you can find the version of the Perl DBI library and report that too, other Monks may find the information useful.

    Also I think you need to show a bit more code - it would be nice to eliminate that as the problem.

    I'm not sure that the above items will give a result, but if everything on the DB side looks right, then maybe its something to do with Perl.

    A Monk aims to give answers to those who have none, and to learn from those who know more.

      I'm really glad that we are guessing together now because i was just about to loose my contenance about this issue ;-)

      Best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

        Well, I found the problem and almost have a solution worked out.

        It appears there's some issues between the version of ActivePerl being implemented on the NFS that we use and data being pulled from the DB. When I made a separate Perl script to pull from the DB and ran it using a fresh Perl install, things worked fine.

        Unfortunately, a lot of other things are dependent on the version of Perl on the NFS. Not a huge issue, just going to leave the DB work in a separate Perl script and call it from a different version of Perl.

        That being said, running into a small problem with passing arguments from the initial Perl script to the now separated one. Code below:

        Initial Perl:

        system('/usr/bin/perl /home/SQLtoXML.pl $source_id $OutputXMLDir');
        Perl script that is being called:
        my $PassedSource = $ARGV[0]; my $FinalDir = $ARGV[1];

        Unfortunately, $PassedSource and $FinalDir are both ending up as null when the program runs. Am I just passing the 2 arguments incorrectly?

        Thanks again for all the help. You've been wonderful.

Re: Encoding issue from DBI to string
by karlgoethebier (Abbot) on Jan 03, 2013 at 18:01 UTC
    "Danger Will Robinson..."

    But nevertheless, isn't the statement valid? Really no encoding issue?

    Please correct me if i'm really totally wrong. Even if considered harmful.

    Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

      Are you asking me if SQL injection is considered harmful, or worth mentioning?

        "Are you asking me if SQL injection is considered harmful, or worth mentioning?"

        No, i've just been thinking that i'm totally wrong...(self-doubt)?

        «The Crux of the Biscuit is the Apostrophe»