Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Problem with Unicode Characters while reading from oracle database in perl script

by venu_hs (Novice)
on Dec 12, 2012 at 12:13 UTC ( #1008495=perlquestion: print w/ replies, xml ) Need Help??
venu_hs has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

I am using DBI to connect to oracle and fetch data and write to a csv file. I am running this perl script in Linux.When i open the csv in windows, i could see some unsupported characters (inverted question mark). To solve this issue and present the data as it is, i am using the following in environment variable setting

setenv NLS_LANG AMERICAN_AMERICA.WE8MSWIN1252

This has not worked. I think this is because the above NLS_LANG setting is only for windows.

Could you please help me if i can use a different NLS_LANG setting for linux to solve the issue with the unicode strings.

Thanks

Comment on Problem with Unicode Characters while reading from oracle database in perl script
Re: Problem with Unicode Characters while reading from oracle database in perl script
by mje (Deacon) on Dec 12, 2012 at 13:15 UTC

    Have you tried AMERICAN_AMERICA.AL32UTF8?

Re: Problem with Unicode Characters while reading from oracle database in perl script
by 2teez (Priest) on Dec 12, 2012 at 15:06 UTC

    Hi venu_hs,
    .. and write to a csv file. ... When i open the csv in windows, i could see some unsupported characters (inverted question mark)..

    If I may ask how are you writing your csv file?
    You could look at Text::CSV or Text::CSV_XS if these are not the module you are using to write your csv file.

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
Re: Problem with Unicode Characters while reading from oracle database in perl script
by afoken (Parson) on Dec 12, 2012 at 18:24 UTC

    Ssome generic hint for encoding problems, especially with Unicode:

    Look at input and output files with a hex dumper / hex editor.
    Far too many editors convert encodings behind the scenes and display garbage when they have a different idea of how the file is encoded from how it is really encoded. od is available on most Unix systems, and there are tons of other hex dumper and hex editors.
    Check the length of strings in perl.
    length always returns the number of characters. If you mess up the encodings, make perl read a "Unicode string" (a string encoded as UTF-8, UTF-16, and so on) as bytes, process it, and write it out as bytes, the code appears to work, but some things (e.g. matching characters) behave strangely. The string "AOUń÷‹", encoded as UTF-8, uses the byte sequence 41 4F 55 C3 84 C3 96 C3 9C. Read with the proper encoding setting, length will return 6. Read as a byte stream, or with a "byte=character" encoding like ISO-8859-1, length will return 9.
    Check the length of strings in the database.
    (Lesson learned from patching DBD::ODBC.) When communicating with a database, encoding problems hide until you read / write with a different tool or check the lengths. This is essentially the same problem as with file I/O, but there is no simple way to get a hex dump.

    Feel free to copy from the files t/40UnicodeRoundTrip.t, t/41Unicode.t, and t/UChelp.pm included in DBD::ODBC.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1008495]
Approved by mje
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (8)
As of 2014-09-20 18:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (160 votes), past polls