Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

DBD::Oracle - Character Encoding Conversion

by Zygomax (Initiate)
on Sep 21, 2015 at 14:41 UTC ( [id://1142633]=perlquestion: print w/replies, xml ) Need Help??

Zygomax has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I'm having a problem using DBD::Oracle. The locale of the system is en_US.UTF-8, and that is the encoding we use for our string-processing. We are selecting data from and inserting data into an Oracle database. The database (default) character set is WE8ISO8859P15. By setting the environment variable NLS_LANG to "AMERICAN_AMERICA.AL32UTF8", we can insert strings with non-ASCII characters just fine. However, when we run a SELECT statement and retrieve the data using fetchall_hashref(), the strings remain in ISO-8859-15--we need them to be UTF-8.

Does anyone know of a DBI or DBD::Oracle setting that would cause the strings to be converted?

All systems are SUSE Linux 11 SP4 x64.

Thanks!

Replies are listed 'Best First'.
Re: DBD::Oracle - Character Encoding Conversion
by kennethk (Abbot) on Sep 21, 2015 at 15:38 UTC
    The packaged code I use to connect to Oracle is:
    $ENV{NLS_LANG} = 'AMERICAN_AMERICA.AL32UTF8'; my $db = DBI->connect( $dbid, $user, $pass, { PrintError => 0, RaiseError => 1, AutoCommit => 0, ora_charset => 'AL32UTF8', }, ); $db->do("ALTER SESSION SET NLS_DATE_FORMAT='yyyy-mm-dd'");
    As I remember, I had to use both the NLS_LANG environmental variable and the ora_charset connect attribute to successfully insert and extract. This seems to be in conflict with the docs, which imply they are redundant.

    The bigger question to consider is how do you know you've inserted correctly, and how do you know your extract is failing? There are many points along a pipeline where an encoding can fail. Have you verified basic input and output? Printed out the string using an encoding such as HTML::Entities so that the output is ASCII?


    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: DBD::Oracle - Character Encoding Conversion
by Anonymous Monk on Sep 21, 2015 at 16:52 UTC
    To echo what Ken said: print what is in the file, say using HTML entities, then print what is in the database table the same way. Verify this by asking an (Oracle) vendor program to print the actual content of a selected column as hex bytes. A real vexation of all things I18N is that anything you "see" is always being translated, even by a shell-terminal program. Unless you hex-dump you really do not know what you are seeing: what you start with, what you end with, what actually is the status along the way. You've got to look at the BYTES.
Re: DBD::Oracle - Character Encoding Conversion
by Zygomax (Initiate) on Sep 21, 2015 at 18:06 UTC

    kennethk's advice worked, although I needed to set the ora_charset attribute to 'WE8ISO8859P15' since that is the default charset of the Oracle DB.

    I definitely understand that the characters you see are not necessarily the characters stored. The reliable way of seeing what was in the Oracle DB was to use the RAWTOHEX SQL function. It's a bit hard to read (no spaces) using SQL*Plus, but I suppose you could read it into your Perl script and print it out better.

    On the Perl side, Data::Dumper shows an escaped character if the charset isn't set: e.g. "resum\x{e9}", where e9 corresponds to "e with acute accent mark". With ora_charset specified, it correctly displays the UTF-8 text.

    And then there is whether your console displays the characters correctly. My main console program doesn't, but I have another one that does.

    Thanks for the help!

      No, you probably could not "print it out better," because it would contain character escapes ... and your terminal, if nothing else, would try its best to print it "properly." I have been known to copy hex sequences to two different text files and then use 'diff' to compare them. As you say, it is very maddening, but you simply cannot 'eyeball' this stuff. There are too many unrelated-things between the data and your eyeballs!
      I'm glad things are working, but as you have both insert and select operations happening, make sure that you can insert a new string, verify its content in the DB, and extract it again correctly. With a mixed mode like that, I would be concerned that what's working now is some data that was previously corrupted.

      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      OK I think I jumped the gun on declaring it fixed. The issue is actually addressed in a CPAN bug here.

      What was happening when I set the "ora_charset" attribute to "WE8ISO8859P15" in connect() was that it stores UTF8 strings into the column of the table. However, those strings should really be ISO-8859-15 because that is the default character set of the DB. (And the column is CHAR not NCHAR.)

      So it looks like I'll need to convert the strings returned from the DB to UTF-8 manually. Not a big problem. If anyone has another idea, let me know.

        As a matter of etiquette, it's considered appropriate to reply to those who reply to you. In addition to maintaining a thread of conversation, PerlMonks can be configured to send a message when someone replies to one of your nodes, so it makes it more likely that the original respondent can continue assisting.

        Can you define manually? I've fixed corrupted files in the past using carefully crafted regular expressions, but that is usually not the right choice. Better would be using Encode to do transformations once you know how your input stream is formatted. Encode::Guess might be helpful there, though it has limited utility because it's such a messy problem. As some background, you should read through perluniintro if you haven't already.

        TL;DR- What you are proposing is possible, but sounds a lot like nested band-aids.


        #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1142633]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-03-28 21:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found