in reply to What to do when converting Excel-supplied data to Unicode
I went through this problem recently. Hopefully the solution I hacked together will work for you as well. Hopefully you're using Spreadsheet::ParseExcel to read the Excel file or this advice will do you no good! :-)
Spreadsheet::ParseExcel allows you to specify a "Formatter" class when you parse an excel file which gives you some flexibility to decode or convert or do whatever other things you want to do as it reads in the file.
I couldn't get any of the Spreadsheet::ParseExcel::Fmt* modules to work how I wanted (I wanted the data to end up as UTF-8 data) so I wrote my own thin subclass of the Spreadsheet::ParseExcel::FmtDefault module:
-- Brian
Spreadsheet::ParseExcel allows you to specify a "Formatter" class when you parse an excel file which gives you some flexibility to decode or convert or do whatever other things you want to do as it reads in the file.
I couldn't get any of the Spreadsheet::ParseExcel::Fmt* modules to work how I wanted (I wanted the data to end up as UTF-8 data) so I wrote my own thin subclass of the Spreadsheet::ParseExcel::FmtDefault module:
Then, when you parse the file, you do something like this:package My::Excel::FmtUTF8; use strict; use base 'Spreadsheet::ParseExcel::FmtDefault'; use Encode qw(decode); # the super-class isn't very friendly to sub-classing, so we have to o +verride this to make # sure it's blessed into the right class sub new { my $class = shift; return bless {}, $class; } # the only other method we need to override... sub TextFmt { my ($self,$data,$encoding) = @_; # Spreadsheet::ParseExcel will pass in the encoding to us! # or, it passes nothing in if it's iso-8859-1 $encoding ||= 'iso-8859-1'; # we perform the decoding in a "fatal" manner so that if it fai +ls, # we'll just pass the data back as-is my $decoded = eval { decode($encoding,$data,1) } || $data; return $decoded; }
That seemed to work for all the non-ASCII data that I had to deal with. Hopefully it will work for you too! :-)my $parser = Spreadsheet::ParseExcel->new(); my $data = $parser->Parse( $excel_file_name, My::Excel::FmtUTF8->new() );
-- Brian
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: What to do when converting Excel-supplied data to Unicode
by davis (Vicar) on May 23, 2006 at 13:56 UTC | |
by bpphillips (Friar) on May 23, 2006 at 15:39 UTC | |
Re^2: What to do when converting Excel-supplied data to Unicode
by ITFinanceGuy (Initiate) on Feb 20, 2009 at 22:53 UTC | |
by davis (Vicar) on Feb 20, 2009 at 23:03 UTC | |
by ITFinanceGuy (Initiate) on Feb 20, 2009 at 23:33 UTC | |
by ITFinanceGuy (Initiate) on Feb 23, 2009 at 22:55 UTC | |
by Anonymous Monk on Feb 24, 2009 at 16:50 UTC |
In Section
Seekers of Perl Wisdom