|Perl: the Markov chain saw|
Re: Extracting Chinese characters from Excelby davies (Vicar)
|on Mar 14, 2013 at 10:15 UTC||Need Help??|
Never having used Unicode in any way, this looked like a good way for me to start teaching myself. In Excel, it’s certainly more complicated than I expected. The first thing I needed was a way of getting the characters into Excel. I was cursing the OP mildly for not doing this, but it turns out to be very difficult. AFAICT, the characters used are UTF16, while Win32:Ole doesn’t export anything beyond UTF8. Recording a VBA macro results in ActiveCell.FormulaR1C1 = "'??" which, needless to say, is utterly unhelpful. The reason I suspect UTF8 to be inadequate is that passing a single code from Perl results in two characters appearing in the cell. Extensive Gargling reveals nothing on getting more advanced Unicode from Perl to Excel.
I should perhaps point out here that my setup may be partly to blame. Never having had the slightest need to use any of these characters – and not even knowing their language – I have not installed lots of strange fonts in the hope that one of them might fit. But I don’t think that’s the case. I can copy & paste the characters without problems and I CAN enter them using the Insert | Symbol dialogue in Excel.
There is a way to get these characters in from VBA, using the ChrW function, but this isn’t available from the face of the spreadsheet. Wanting a SSCCE (http://sscce.org/), I went down the route shown in http://dailydoseofexcel.com/archives/2012/06/23/unicode-and-diacritic-characters/, getting the VBA in via Perl using the technique shown in RFC Tutorial: Adding and extracting VBA to and from Excel files.
At this stage, I had the data in, but getting it out again was the problem. The OP’s “hex_dump_cell” routine includes an
It isn’t possible to combine the call to the extraction routine in the same cell as the characters to be decoded. Something within Excel converts these to “?” characters when I tried. Therefore, I have had to put formulae in the second row. Since I am not saving the spreadsheet, this should be no problem. The only thing needing care is that the formulae don’t overwrite a cell that will be decoded later. If this happens, things can get recursive very quickly.
I think this mess is best summarised by Chesterton: http://www.gkc.org.uk/gkc/books/rolling.html.