Re: Parsing a .xlsx file with chinese characters

by CountZero (Bishop)
on Oct 02, 2011 at 07:13 UTC

in reply to Parsing a .xlsx file with chinese characters

It would be most helpful if you could post a small script that actually shows the problem you describe. There are so many ways to break UTF-8 encoded files. It is even entirely possible that the problem is with the program you use to view the flat file.


Re^2: Parsing a .xlsx file with chinese characters
by Sithiris (Novice) on Oct 02, 2011 at 19:02 UTC

    At the moment I am just using the synopsis code found on the Spreadsheet::XLSX cpan page

    use Spreadsheet::XLSX; my $excel = Spreadsheet::XLSX -> new ('sample.xlsx'); foreach my $sheet (@{$excel -> {Worksheet}}) { printf("Sheet: %s\n", $sheet->{Name}); $sheet -> {MaxRow} ||= $sheet -> {MinRow}; foreach my $row ($sheet -> {MinRow} .. $sheet -> {MaxRow}) { $sheet -> {MaxCol} ||= $sheet -> {MinCol}; foreach my $col ($sheet -> {MinCol} .. $sheet -> {Max +Col}) { my $cell = $sheet -> {Cells} [$row] [$col]; if ($cell) { printf("( %s , %s ) => %s\n", $row, $col, +$cell -> {Val}); } } } }

    This output everything fine including more standard Unicode characters however with the Chinese characters it output characters like 礼……’—œ‰™…司–天œ instead. I know it is a problem with how I am using the module as I managed to get a script working ok with Spreadsheet::Parse Excel and a .xls file however my work requires the files to be .xlsx and so I can not use this script

      I gave this a test with the exact code you gave on my machine, and it worked great!

      $ perl Sheet: Sheet1 ( 0 , 0 ) => this ( 1 , 1 ) => is ( 2 , 2 ) => a ( 3 , 3 ) => test ( 4 , 1 ) => 什麼 Sheet: Sheet2 Sheet: Sheet3 $

      (PM may convert the text (traditional Chinese "shenme" -- what) into an entity here, but it definitely worked in my xterm)

      It may be that whatever you're using to view the file isn't expecting UTF-8; or, perhaps the encoding in the XLSX itself isn't UTF-8 (but I'm not sure if that's an option in XLSX files or what!).

        thanks for trying it. I'm guessing from a quick google search of xterm you are running the script in a non-Windows environment? Is it possible this would have an effect on it's success? I'm guessing doubtfully considering excel is a windows based programme.

Node Type: note
