Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

ParseExcel:: wackiness

by misterperl (Beadle)
on Oct 05, 2010 at 15:04 UTC ( #863619=perlquestion: print w/replies, xml ) Need Help??
misterperl has asked for the wisdom of the Perl Monks concerning the following question:

Hello PMs: I've been using ParseExcel forever on various forms. Suddenly a form shows up that appears garbled. Closer inspection shows that apparently somehow a NULL char is being inserted between EVERY character in the {Val} fields. For example a field has this: "My value" Perl instead sees: "\c@M\c@y\c@ \c@V\c@a\c@l\c@u\c@e" My first inclination was my client mucked with the form- but I pulled it into Excel, and went in and checked SHOW CONTROL CHARS and the fields are perfect- no control chars. I wrote a cleanup that zaps all the \c@ with a REGEX, but I'd PREFER to have a clean input. The other strangeness is that there is another hash field called _Value (instead of Val), which looks correct all the time. So I considered switching the code over to just use that instead. Insights and wisdom on this matter are most appreciated. MP

Replies are listed 'Best First'.
Re: ParseExcel:: wackiness
by jmcnamara (Monsignor) on Oct 05, 2010 at 15:29 UTC
    The data being returned is in Excel's UTF-16 Unicode format.

    You will need to specify a Unicode formatter in the parse() method to handle the data. Something like this:

    #!/usr/bin/perl use warnings; use strict; use Spreadsheet::ParseExcel; use Spreadsheet::ParseExcel::FmtJapan; my $filename = 'file.xls'; my $parser = Spreadsheet::ParseExcel->new(); my $formatter = Spreadsheet::ParseExcel::FmtJapan->new(); my $workbook = $parser->parse($filename, $formatter); if ( !defined $workbook ) { die "Parsing error: ", $parser->error(), ".\n"; } # Set your output encoding to something like this. binmode STDOUT, ':utf8'; for my $worksheet ( $workbook->worksheets() ) { print "Worksheet name: ", $worksheet->get_name(), "\n\n"; my ( $row_min, $row_max ) = $worksheet->row_range(); my ( $col_min, $col_max ) = $worksheet->col_range(); for my $row ( $row_min .. $row_max ) { for my $col ( $col_min .. $col_max ) { my $cell = $worksheet->get_cell( $row, $col ); next unless $cell; print " Row, Col = ($row, $col)\n"; print " Value = ", $cell->value(), "\n" +; print " Unformatted = ", $cell->unformatted(), "\n" +; print "\n"; } } } __END__

    Ignore the fact that the formatter is called FmtJapan, it is also a general purpose Unicode handler.


      Thanks for that, John! I ran into the same thing a few months ago, and merely did the same as misterperl (edited the wackiness out manually).

      Your module has come in handy many times, I appreciate your work...
      most helpful thanks..

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://863619]
Approved by johngg
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2018-05-24 04:40 GMT
Find Nodes?
    Voting Booth?