Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re^4: Japanese character in Linux

by prafulltc (Acolyte)
on Jul 08, 2011 at 06:04 UTC ( #913291=note: print w/replies, xml ) Need Help??

in reply to Re^3: Japanese character in Linux
in thread Japanese character in Linux

In Sybase Japanese data columns are encoded in Shift-JIS encoding.

We are retrieving this data using DBI.
use DBI qw(:sql_types);

 if ( @row = $dbFOX_sth->fetchrow_array ) {
                 ( $sInstrumentNameJ, $sInstrumentShortJ )    = @row;

When we print this data in unix console it comes as junk.

After we get this value in a variable we pass this to a stored proc which inserts data in Oracle Nvarchar2 data type field.

Here it comes as inverted ?.

Please advise.

Replies are listed 'Best First'.
Re^5: Japanese character in Linux
by andal (Hermit) on Jul 08, 2011 at 07:43 UTC

    I guess, we'll have to go step by step. First, add "use Encode;". After you've obtained the values from DB, check if they are converted to internal perl encoding using

    print Encode::is_utf8($sInstrumentNameJ), "\n";
    If this produces "1", then the value is converted to perl's internal form and we should check how you output it to the terminal. If this produces empty string, then the value is not converted by the driver. In this case you have to convert it manually.

    In either case, we have to know which locale is active in your terminal emulator. Normally, it shall be some UTF-8 locale, but who knows. Please provide output of "locale" command.

    Also, if the "is_utf8" function produces empty string, it would be good to provide here the hexdump of the value you get from the database. Using this way for example

    print unpack("H*", $sInstrumentNameJ), "\n";
    And also the Japanese text it should correspond to.

      is_urt8 is returning nothing and output of unpack is 81698a94816a8bc9976d.
      We tried one more function find_encoding of Encode module and output is Encode::XS=SCALAR(0xaad27d0).
      W are not getting actual output when we use encode and decode fucntins.
      Output of locale command is as below
      Please sugegst where we are going wrong.

        There are quite a few things that are wrong. First of all, with locale "C" it is not possible to see any Japanese text, or actually any text outside of ASCII. So, it looks like you have to change your locale first. Assuming, that you use xterm as your terminal emulator, try to start it this way

        LC_CTYPE="en_US.UTF-8" xterm &
        Then check the output of locale command in that new terminal window. It should produce something like
        LANG= LC_CTYPE=en_US.UTF-8

        After that, you should convert your text to UTF-8 encoding. Since you haven't provided Japanese text corresponding to the hexdump, and since the hexdump looks like UTF-16, I'll assume, that it is UTF-16. Then you should output the variable using the following:

        Encode::from_to($your_variable, "UTF-16", "UTF-8"); print $your_variable, "\n";
        If this does not produce the result, then you may have some wrong font for terminal emulator :)

        Actually, sending the variable to terminal probably is not the most important thing. You have to work with that value inside of perl. Depending on what you want to do, you have to convert your variables back and forth. For example, to do pattern matching on it, you should first do

        use utf8; my $converted = Encode::decode("UTF-16", $your_variable); $converted =~ /some Japanese text/;

Re^5: Japanese character in Linux
by Corion (Pope) on Jul 08, 2011 at 06:42 UTC

    Please see points 2 to 5 of my reply.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://913291]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2018-09-24 05:16 GMT
Find Nodes?
    Voting Booth?
    Eventually, "covfefe" will come to mean:

    Results (191 votes). Check out past polls.

    • (Sep 10, 2018 at 22:53 UTC) Welcome new users!