Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

Re^6: Parsing a .xlsx file with chinese characters

by Sithiris (Novice)
on Oct 05, 2011 at 21:17 UTC ( #929884=note: print w/replies, xml ) Need Help??

in reply to Re^5: Parsing a .xlsx file with chinese characters
in thread Parsing a .xlsx file with chinese characters

I have said in my script to print to a UTF8 encoded text file which I opened in word and it displayed correctly just wrong characters.

what I am thinking is that it may be 'deconstructing the character for example instead of "\x{2013}" it is displaying "\xE2","\x80","\x93". If this is the case would there be a way to force it?

  • Comment on Re^6: Parsing a .xlsx file with chinese characters

Replies are listed 'Best First'.
Re^7: Parsing a .xlsx file with chinese characters
by anneli (Pilgrim) on Oct 05, 2011 at 21:55 UTC

    I think Word is probably half-responsible for the mangling here. If it's trying to display each byte, then it means it's not actually reading it as UTF-8, but in some other encoding!

    I'll give an example using Windows. First, here's

    # U+73E0 ("pearl") print "\xe7\x8f\xa0";

    Now, I execute that and redirect it to both utf8.html and utf8.txt.

    Chrome displays the character correctly, because it assumes UTF-8 by default. Notepad also appears smart enough to guess the encoding.

    On my system at least, opening the file with Word prompts me to select the encoding; and by default, it guesses UTF-8 and renders the character correctly. Note that if I pick "Windows (Default)" or "MS-DOS", I get garbage.

    So try messing with Word a bit; if you use the File -> Open menu (instead of just opening the file from Explorer directly), you can get additional conversion options (sometimes!).


      Well I solved my problem using

      pack "U0C*", unpack "C*", ($cell->{val});

      I'm not entirely sure what that does, but I'm guessing it unpacks the characters into its octets and then repackages it up as a Unicode character... Anyway whatever it does it worked so now all the characters are displayed correctly.

      Thank you for your help :)

        How curious! I'm glad you got it working. :)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://929884]
[Corion]: marto: Currently all well - work is even OK currently ;) I should start translating my WWW::Mechanize:: Chrome talk to English and recheck that my demos all work, but I have two weeks for that still :)
[Corion]: So, most things are looking good :-D
[Corion]: I should write some more code for my Jekyll clone, but I don't feel like it, so maybe I should just let that linger instead ;)
[karlgoethebier]: "...mein enemy..."
[marto]: it has finally stopped raining!
[Corion]: marto: Heh - we've got the rain here, now. But it's only intermittently, not permanent. But funny that you should complain about rain ;)

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2017-07-26 10:48 GMT
Find Nodes?
    Voting Booth?
    I came, I saw, I ...

    Results (390 votes). Check out past polls.