in reply to Re: What to do when converting Excel-supplied data to Unicode
in thread What to do when converting Excel-supplied data to Unicode

Hi, I tried using your formatter class but I do not see any difference in my garbled output. However, if I print the data inside the overriding TextFmt function, it looks correct. I read the data as in the following:

my $formatter = My::Excel::FmtUTF8->new(); my $parser = Spreadsheet::ParseExcel->new(); my $excel = $parser->Parse($file, $formatter); my $sheet = $excel->{Worksheet}->[0]; ... my $cell = $sheet->{Cells}[$row][$col]; my $cellVal = $cell->{Val};

Does that look correct? Why does my $cellVal end up different than $decoded in TextFmt()?

Is there any other (easy/perl) way to read Excel files that have Unicode data? It is confusing to me that I can write Unicode data to an excel file using WriteExcel, but I cannot read it.

Regards,
ITFinanceGuy

Replies are listed 'Best First'.
Re^3: What to do when converting Excel-supplied data to Unicode
by davis (Vicar) on Feb 20, 2009 at 23:03 UTC
    Curious. When do you print $decoded inside the sub? Just before the return?

    davis

      Yes. I just figured it out. I was reading the data using $cell->{Val} which apparently contains the unformatted cell value. I changed to using $cell->value() and that worked (almost..).

      I still cannot read Chinese characters from an Excel file. I am guessing the encoding I pass to the decode() function in TextFmt() is wrong, but I do not know how to find the encoding of Excel cells. I tried 'UTF16-LE', 'UTF16-BE', and 'UTF-8' but none worked. Leaving the TextFmt function as it is posted here does not work. Anyone have an idea as to what I might be doing wrong?

      Yes. I figured out part of the problem. Apparently $cell->{Val} contains the unformatted value of a cell, whereas $cell->value() has the formatted value.

      But I am still having trouble extracting the right data. My output looks garbled. I tried converting from UTF16-LE, UTF16-BE, UTF-8, and nothing seems to work. I even tried downloading several utilities from the web with no luck converting Unicode data. The only thing that does work is if I save the file as a different format from within Excel itself, but I am looking for a batch way to do this.

      Does anyone know of a command line utility that will convert .xls files containing Unicode data into XML? Or a library that works? Thanks in advance!

      Nevermind. I determined that I should be using visual basic + the Microsoft Office Excel reference library to convert the files and then read them with Perl. Following suggestions from:
      http://stackoverflow.com/questions/174446/how-to-automate-converting-excel-xls-files-to-excel-xml-fromat