Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^7: Parsing a .xlsx file with chinese characters

by anneli (Pilgrim)
on Oct 05, 2011 at 21:55 UTC ( #929892=note: print w/replies, xml ) Need Help??


in reply to Re^6: Parsing a .xlsx file with chinese characters
in thread Parsing a .xlsx file with chinese characters

I think Word is probably half-responsible for the mangling here. If it's trying to display each byte, then it means it's not actually reading it as UTF-8, but in some other encoding!

I'll give an example using Windows. First, here's utf8.pl:

# U+73E0 ("pearl") print "\xe7\x8f\xa0";

Now, I execute that and redirect it to both utf8.html and utf8.txt.

Chrome displays the character correctly, because it assumes UTF-8 by default. Notepad also appears smart enough to guess the encoding.

On my system at least, opening the file with Word prompts me to select the encoding; and by default, it guesses UTF-8 and renders the character correctly. Note that if I pick "Windows (Default)" or "MS-DOS", I get garbage.

So try messing with Word a bit; if you use the File -> Open menu (instead of just opening the file from Explorer directly), you can get additional conversion options (sometimes!).

Anne

Replies are listed 'Best First'.
Re^8: Parsing a .xlsx file with chinese characters
by Sithiris (Novice) on Oct 06, 2011 at 20:04 UTC

    Well I solved my problem using

    pack "U0C*", unpack "C*", ($cell->{val});

    I'm not entirely sure what that does, but I'm guessing it unpacks the characters into its octets and then repackages it up as a Unicode character... Anyway whatever it does it worked so now all the characters are displayed correctly.

    Thank you for your help :)

      How curious! I'm glad you got it working. :)

        Me too. Saved me a few hours, I'm sure.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://929892]
help
Chatterbox?
[Lady_Aleena]: Discipulus, print working directory?
[Corion]: "P(ass)w(or)d" ;)
[Lady_Aleena]: Corion, I knew that, I was just giving Discipulus a hard time.

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2017-04-26 08:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I'm a fool:











    Results (471 votes). Check out past polls.