Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re^2: Parsing a .xlsx file with chinese characters

by Sithiris (Novice)
on Oct 02, 2011 at 19:02 UTC ( #929167=note: print w/replies, xml ) Need Help??

in reply to Re: Parsing a .xlsx file with chinese characters
in thread Parsing a .xlsx file with chinese characters

At the moment I am just using the synopsis code found on the Spreadsheet::XLSX cpan page

use Spreadsheet::XLSX; my $excel = Spreadsheet::XLSX -> new ('sample.xlsx'); foreach my $sheet (@{$excel -> {Worksheet}}) { printf("Sheet: %s\n", $sheet->{Name}); $sheet -> {MaxRow} ||= $sheet -> {MinRow}; foreach my $row ($sheet -> {MinRow} .. $sheet -> {MaxRow}) { $sheet -> {MaxCol} ||= $sheet -> {MinCol}; foreach my $col ($sheet -> {MinCol} .. $sheet -> {Max +Col}) { my $cell = $sheet -> {Cells} [$row] [$col]; if ($cell) { printf("( %s , %s ) => %s\n", $row, $col, +$cell -> {Val}); } } } }

This output everything fine including more standard Unicode characters however with the Chinese characters it output characters like 礼……’—œ‰™…司–天œ instead. I know it is a problem with how I am using the module as I managed to get a script working ok with Spreadsheet::Parse Excel and a .xls file however my work requires the files to be .xlsx and so I can not use this script

Replies are listed 'Best First'.
Re^3: Parsing a .xlsx file with chinese characters
by anneli (Pilgrim) on Oct 03, 2011 at 07:41 UTC

    I gave this a test with the exact code you gave on my machine, and it worked great!

    $ perl Sheet: Sheet1 ( 0 , 0 ) => this ( 1 , 1 ) => is ( 2 , 2 ) => a ( 3 , 3 ) => test ( 4 , 1 ) => 什麼 Sheet: Sheet2 Sheet: Sheet3 $

    (PM may convert the text (traditional Chinese "shenme" -- what) into an entity here, but it definitely worked in my xterm)

    It may be that whatever you're using to view the file isn't expecting UTF-8; or, perhaps the encoding in the XLSX itself isn't UTF-8 (but I'm not sure if that's an option in XLSX files or what!).

      thanks for trying it. I'm guessing from a quick google search of xterm you are running the script in a non-Windows environment? Is it possible this would have an effect on it's success? I'm guessing doubtfully considering excel is a windows based programme.

        You're right; I ran it on a Linux VM.

        If you're running this in the Windows terminal (cmd.exe or what have you), I'm inclined to think the problem isn't with the output from Excel::Spreadsheet, but that cmd doesn't display UTF-8 properly.

        What if you redirect the output of the script to a .html file, then try loading it in a browser? Make sure the encoding gets detected as UTF-8. If it displays correctly, it's just the terminal, and your data is fine. :)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://929167]
[LanX]: Do students have the constitutional right to get armed to defend themself against armed teachers?
[Discipulus]: vasanth.easyrider perseverare diabolicum

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (10)
As of 2018-02-23 12:25 GMT
Find Nodes?
    Voting Booth?
    When it is dark outside I am happiest to see ...

    Results (302 votes). Check out past polls.