Re^6: Parsing a .xlsx file with chinese characters

in reply to Re^5: Parsing a .xlsx file with chinese characters
in thread Parsing a .xlsx file with chinese characters

I have said in my script to print to a UTF8 encoded text file which I opened in word and it displayed correctly just wrong characters.

what I am thinking is that it may be 'deconstructing the character for example instead of "\x{2013}" it is displaying "\xE2","\x80","\x93". If this is the case would there be a way to force it?

Comment on Re^6: Parsing a .xlsx file with chinese characters

Replies are listed 'Best First'.
Re^7: Parsing a .xlsx file with chinese characters by anneli (Pilgrim) on Oct 05, 2011 at 21:55 UTC
I think Word is probably half-responsible for the mangling here. If it's trying to display each byte, then it means it's not actually reading it as UTF-8, but in some other encoding! I'll give an example using Windows. First, here's utf8.pl: `# U+73E0 ("pearl") print "\xe7\x8f\xa0";` [download] Now, I execute that and redirect it to both utf8.html and utf8.txt. Chrome displays the character correctly, because it assumes UTF-8 by default. Notepad also appears smart enough to guess the encoding. On my system at least, opening the file with Word prompts me to select the encoding; and by default, it guesses UTF-8 and renders the character correctly. Note that if I pick "Windows (Default)" or "MS-DOS", I get garbage. So try messing with Word a bit; if you use the File -> Open menu (instead of just opening the file from Explorer directly), you can get additional conversion options (sometimes!). Anne	[reply] [d/l]
Re^8: Parsing a .xlsx file with chinese characters by Sithiris (Novice) on Oct 06, 2011 at 20:04 UTC
Well I solved my problem using `pack "U0C", unpack "C", ($cell->{val});` I'm not entirely sure what that does, but I'm guessing it unpacks the characters into its octets and then repackages it up as a Unicode character... Anyway whatever it does it worked so now all the characters are displayed correctly. Thank you for your help :)	[reply] [d/l]
Re^9: Parsing a .xlsx file with chinese characters by anneli (Pilgrim) on Oct 06, 2011 at 22:11 UTC
How curious! I'm glad you got it working. :)	[reply]
Re^10: Parsing a .xlsx file with chinese characters by Anonymous Monk on Dec 17, 2012 at 22:16 UTC

In Section Seekers of Perl Wisdom