http://www.perlmonks.org?node_id=999149

lordsll has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to process an Excel file with Chinese characters in one of the columns and then write out those characters to a file. I have reduced the code as much as I know how. The output file has "? ?" from "李 氏" input. I am fairly new to perl and would appreciate help. Thank you for your time.
#!/usr/bin/perl use warnings; use strict; use Win32::OLE; use Win32::OLE::Const 'Microsoft Excel'; use Win32::OLE qw(in with); use utf8; use Encode; my $editOWS_file = 'C:\\Users\\lordsll\\Downloads\\Chinese_output.out' +; open OUT1, ">:encoding(UTF-8)","$editOWS_file" or die "Can't write on +file $editOWS_file: $!\n"; my $Excel = Win32::OLE->new("Excel.Application"); $Excel->{Visible} = 1; my $Book = $Excel->Workbooks->Open("C:\\Users\\LordsLL\\Downloads\\pro +blem_sample.xlsx"); $Excel->Worksheets(1)->Activate(); my $cell = $Excel->Worksheets(1)->Range("A1")->{Value}; print OUT1 "$cell\n";

Replies are listed 'Best First'.
Re: Read/Write Chinese Characters
by CountZero (Bishop) on Oct 15, 2012 at 21:22 UTC
    Have a look at the xls2csv script. It can recode your spreadsheet into a CSV file and change the encoding on the fly.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: Read/Write Chinese Characters
by Anonymous Monk on Oct 15, 2012 at 20:22 UTC
    Use a hex-dump tool to examine the actual byte-by-byte content of the file. You need to know exactly what data this "? ?" sequence corresponds to. If they are in fact the correct byte-groups for these two characters, then your problem becomes that you haven't told whoever's processing the output file that these two characters (hence, the whole file) is "Chinese." Not knowing what character set to use, and having no equivalent in the character set being used, it's spitting out question-marks. But, that's not a fault of the data.
Re: Read/Write Chinese Characters
by remiah (Hermit) on Oct 15, 2012 at 20:17 UTC

    Hello lordsll.

    I have no xlsx env now, and wonder what is the encoding of $cell?

    Before that, what kind of program you use when you say "? ?" ? Is it same when you see "Chinese_output.out" with browser?

    regards.

Re: Read/Write Chinese Characters
by nikosv (Deacon) on Oct 16, 2012 at 06:00 UTC
    set Win32::OLE in UTF8 mode ?

    Win32::OLE->Option(CP => Win32::OLE::CP_UTF8);

      I saw OP's situation now on windows.

      And you are right.

      Without setting code page option, it returns default code page value, in my case CP932, and sometimes it fails in cases like 'ri' + 'space'.

      With setting code page to Win32::OLE::CP_UTF8, as you said, it returns decoded character and works fine.

      thanks and regards.

      Thank you so much. It works great. I really appreciate everyone's help.
Re: Read/Write Chinese Characters
by lordsll (Novice) on Oct 15, 2012 at 21:31 UTC
    The problem appears to be in reading the data from Excel. If I read the data and immediately write it out to B1, I get the "? ?".
Re: Read/Write Chinese Characters
by lordsll (Novice) on Oct 15, 2012 at 21:05 UTC
    Thank you for your help. I am not sure how to tell how the cell is encoded. The characters display correcting within Excel. The problem acts the same in either xls or xlsx. I am displaying the value of the file correctly. I am using an editor that reads and displays all Chinese files correctly except those written by perl.

      Does your editor take it as GB2312 or UTF-8? or Big5?

      Is there a correct one when you open the file(Chinese_output.out) with browser and changing encoding with browser's menu?