Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Read/Write Chinese Characters

by lordsll (Novice)
on Oct 15, 2012 at 18:36 UTC ( #999149=perlquestion: print w/ replies, xml ) Need Help??
lordsll has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to process an Excel file with Chinese characters in one of the columns and then write out those characters to a file. I have reduced the code as much as I know how. The output file has "? ?" from "李 氏" input. I am fairly new to perl and would appreciate help. Thank you for your time.
#!/usr/bin/perl use warnings; use strict; use Win32::OLE; use Win32::OLE::Const 'Microsoft Excel'; use Win32::OLE qw(in with); use utf8; use Encode; my $editOWS_file = 'C:\\Users\\lordsll\\Downloads\\Chinese_output.out' +; open OUT1, ">:encoding(UTF-8)","$editOWS_file" or die "Can't write on +file $editOWS_file: $!\n"; my $Excel = Win32::OLE->new("Excel.Application"); $Excel->{Visible} = 1; my $Book = $Excel->Workbooks->Open("C:\\Users\\LordsLL\\Downloads\\pro +blem_sample.xlsx"); $Excel->Worksheets(1)->Activate(); my $cell = $Excel->Worksheets(1)->Range("A1")->{Value}; print OUT1 "$cell\n";

Comment on Read/Write Chinese Characters
Download Code
Re: Read/Write Chinese Characters
by remiah (Hermit) on Oct 15, 2012 at 20:17 UTC

    Hello lordsll.

    I have no xlsx env now, and wonder what is the encoding of $cell?

    Before that, what kind of program you use when you say "? ?" ? Is it same when you see "Chinese_output.out" with browser?

    regards.

Re: Read/Write Chinese Characters
by Anonymous Monk on Oct 15, 2012 at 20:22 UTC
    Use a hex-dump tool to examine the actual byte-by-byte content of the file. You need to know exactly what data this "? ?" sequence corresponds to. If they are in fact the correct byte-groups for these two characters, then your problem becomes that you haven't told whoever's processing the output file that these two characters (hence, the whole file) is "Chinese." Not knowing what character set to use, and having no equivalent in the character set being used, it's spitting out question-marks. But, that's not a fault of the data.
Re: Read/Write Chinese Characters
by lordsll (Novice) on Oct 15, 2012 at 21:05 UTC
    Thank you for your help. I am not sure how to tell how the cell is encoded. The characters display correcting within Excel. The problem acts the same in either xls or xlsx. I am displaying the value of the file correctly. I am using an editor that reads and displays all Chinese files correctly except those written by perl.

      Does your editor take it as GB2312 or UTF-8? or Big5?

      Is there a correct one when you open the file(Chinese_output.out) with browser and changing encoding with browser's menu?

Re: Read/Write Chinese Characters
by CountZero (Bishop) on Oct 15, 2012 at 21:22 UTC
    Have a look at the xls2csv script. It can recode your spreadsheet into a CSV file and change the encoding on the fly.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: Read/Write Chinese Characters
by lordsll (Novice) on Oct 15, 2012 at 21:31 UTC
    The problem appears to be in reading the data from Excel. If I read the data and immediately write it out to B1, I get the "? ?".
Re: Read/Write Chinese Characters
by nikosv (Hermit) on Oct 16, 2012 at 06:00 UTC
    set Win32::OLE in UTF8 mode ?

    Win32::OLE->Option(CP => Win32::OLE::CP_UTF8);

      I saw OP's situation now on windows.

      And you are right.

      Without setting code page option, it returns default code page value, in my case CP932, and sometimes it fails in cases like 'ri' + 'space'.

      With setting code page to Win32::OLE::CP_UTF8, as you said, it returns decoded character and works fine.

      thanks and regards.

      Thank you so much. It works great. I really appreciate everyone's help.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://999149]
Approved by nemesdani
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (10)
As of 2014-12-22 08:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (113 votes), past polls