I went through this problem recently. Hopefully the solution I hacked together will work for you as well. Hopefully you're using Spreadsheet::ParseExcel to read the Excel file or this advice will do you no good! :-)

Spreadsheet::ParseExcel allows you to specify a "Formatter" class when you parse an excel file which gives you some flexibility to decode or convert or do whatever other things you want to do as it reads in the file.

I couldn't get any of the Spreadsheet::ParseExcel::Fmt* modules to work how I wanted (I wanted the data to end up as UTF-8 data) so I wrote my own thin subclass of the Spreadsheet::ParseExcel::FmtDefault module:
package My::Excel::FmtUTF8; use strict; use base 'Spreadsheet::ParseExcel::FmtDefault'; use Encode qw(decode); # the super-class isn't very friendly to sub-classing, so we have to o +verride this to make # sure it's blessed into the right class sub new { my $class = shift; return bless {}, $class; } # the only other method we need to override... sub TextFmt { my ($self,$data,$encoding) = @_; # Spreadsheet::ParseExcel will pass in the encoding to us! # or, it passes nothing in if it's iso-8859-1 $encoding ||= 'iso-8859-1'; # we perform the decoding in a "fatal" manner so that if it fai +ls, # we'll just pass the data back as-is my $decoded = eval { decode($encoding,$data,1) } || $data; return $decoded; }
Then, when you parse the file, you do something like this:
my $parser = Spreadsheet::ParseExcel->new(); my $data = $parser->Parse( $excel_file_name, My::Excel::FmtUTF8->new() );
That seemed to work for all the non-ASCII data that I had to deal with. Hopefully it will work for you too! :-)

-- Brian

In reply to Re: What to do when converting Excel-supplied data to Unicode by bpphillips
in thread What to do when converting Excel-supplied data to Unicode by davis

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.