Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^3: Parsing a .xlsx file with chinese characters

by anneli (Pilgrim)
on Oct 03, 2011 at 07:41 UTC ( #929285=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Parsing a .xlsx file with chinese characters
in thread Parsing a .xlsx file with chinese characters

I gave this a test with the exact code you gave on my machine, and it worked great!

$ perl test.pl Sheet: Sheet1 ( 0 , 0 ) => this ( 1 , 1 ) => is ( 2 , 2 ) => a ( 3 , 3 ) => test ( 4 , 1 ) => 什麼 Sheet: Sheet2 Sheet: Sheet3 $

(PM may convert the text (traditional Chinese "shenme" -- what) into an entity here, but it definitely worked in my xterm)

It may be that whatever you're using to view the file isn't expecting UTF-8; or, perhaps the encoding in the XLSX itself isn't UTF-8 (but I'm not sure if that's an option in XLSX files or what!).


Comment on Re^3: Parsing a .xlsx file with chinese characters
Download Code
Re^4: Parsing a .xlsx file with chinese characters
by Sithiris (Novice) on Oct 03, 2011 at 21:32 UTC

    thanks for trying it. I'm guessing from a quick google search of xterm you are running the script in a non-Windows environment? Is it possible this would have an effect on it's success? I'm guessing doubtfully considering excel is a windows based programme.

      You're right; I ran it on a Linux VM.

      If you're running this in the Windows terminal (cmd.exe or what have you), I'm inclined to think the problem isn't with the output from Excel::Spreadsheet, but that cmd doesn't display UTF-8 properly.

      What if you redirect the output of the script to a .html file, then try loading it in a browser? Make sure the encoding gets detected as UTF-8. If it displays correctly, it's just the terminal, and your data is fine. :)

        I have said in my script to print to a UTF8 encoded text file which I opened in word and it displayed correctly just wrong characters.

        what I am thinking is that it may be 'deconstructing the character for example instead of "\x{2013}" it is displaying "\xE2","\x80","\x93". If this is the case would there be a way to force it?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://929285]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2014-10-23 00:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (122 votes), past polls