Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Solved: Preserving UTF-8 characters in Oracle and XML

by poltr1 (Novice)
on Nov 07, 2011 at 21:47 UTC ( [id://936587]=perlquestion: print w/replies, xml ) Need Help??

poltr1 has asked for the wisdom of the Perl Monks concerning the following question:

(Actually, this isn't a question, but a solution in case others are looking for the same nugget of wisdom.)

I have incoming data in XML that's UTF-8 encoded. It includes special symbols such as nonbreaking spaces, the registered trademark symbol (R), the "TM" symbol, Greek letters, etc. This data is being put into an Oracle database via script that reads the XML, parses it using XML::Twig, and saves it to the database via DBI. In order to preserve these characters, here's what I had to do:

1) Add this line to Perl scripts:

use encoding "utf-8"; # For text containing I +SO-8859-1 and/or UTF-8 characters

2) Add the encoding qualifier to any 'open' statements on files that contain UTF-8:

open (OUT, ">:encoding(UTF-8)", ">$outfile") || die "Unable to open $o +utfile for writing.";

3) Add this line if interfacing with an Oracle database:

$ENV{NLS_LANG} = 'AMERICAN_AMERICA.AL32UTF8'; # Handle UTF-8 charac +ters

4) Modify the Oracle login to include the 'ora_charset => "UTF8"' attribute in the connect string:

my $dbh = DBI->connect("dbi:Oracle:$DATABASE", $USERNAME, $PASSWORD, { RaiseError => 1, AutoCommit => 0, ora_charset => "UTF8" }) or die ($DBI::errstr . "\n");

5) If processing XML files with encoding="UTF-8" via XML::Twig, add the output_encoding attribute to the call:

$twig = XML::Twig->new( twig_handlers => $handlers, output_encoding => + 'utf-8' );

Replies are listed 'Best First'.
Re: Solved: Preserving UTF-8 characters in Oracle and XML
by poltr1 (Novice) on Nov 09, 2011 at 15:37 UTC

    Some more gotchas I encountered along the way:

    1) I use PuTTY to log in to Unix systems. In order for me to properly display UTF-8 characters, I needed to set my character encoding to UTF-8 in my PuTTY configuration. (Window/Translation.)

    2) Some data providers may translate UTF-8 characters to their HTML character entity equivalents, and pass it off as UTF-8. To correct this, use the HTML::Entities package from CPAN, and call the decode_entities() function.

Re: Solved: Preserving UTF-8 characters in Oracle and XML
by mje (Curate) on Nov 10, 2011 at 15:55 UTC

    I don't think you need ora_charset and NLS_LANG setting - one or the other will do. Also, be careful with ora_charset if you make more than one connection and expect to use different client charsets on each one as by default charsets are per environment and by default DBD::Oracle only creates one environment handle (see ora_envhp).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://936587]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2024-04-25 23:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found