Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

The XLSX Perl modules have a simple problem

by BerntB (Deacon)
on Jun 14, 2019 at 09:27 UTC ( #11101336=perlmeditation: print w/replies, xml ) Need Help??


I write here so people can find the problem when they search.

XLSX is a Zipped file format with XML. When some applications (it seems Libre Office and others) save as XLSX, they set the dimension size bad. And all the Perl modules I could find failed on that.

A typical example is that in a sheet file (like xl/worksheets/sheet1.xml, after unpacking) you find

<dimension ref="G2296"/>
The Spreadsheet::ParseXLSX module (and others) seems to expect something like "A1:G2296" there. It sets the values for Min/Max Column/Row wrong.

Spreadsheet::ParseXLSX use Spreadsheet::ParseExcel objects for worksheets etc, so the API is compatible. The get_cell() method there just returns undef if the asked cell is outside the dimension boundaries (which it of course is).

A temporary fix would be to set the MinRow/MinCol/MaxCol/MaxRow values in the spreadsheet object:

for my $s ($parsed_document->worksheets() ) { $s->{MinRow}=0; $s->{MinCol}=0; # $s->{MaxCol}=100; (Edit: Not needed) # $s->{MaxRow}=7000; (Edit: Not needed) ... my $c = $s->get_cell($row, $column); ...

I don't really know enough about the XLSX format. I assume the easy solution is a few lines. Like keeping up max/min column numbers when reading the file. I'll send in a bugfix when I get so far. (With tests so I don't get any grumbles back this time. :-) )

Replies are listed 'Best First'.
Re: The XLSX Perl modules have a simple problem
by BerntB (Deacon) on Jun 14, 2019 at 12:28 UTC
    There wasn't any relevant tests for this sub to extend. But a trivial fix seems to solve the problem for Spreadsheet::ParseXLSX v 0.27:
    # (This is in Spreadsheet/ sub _dimensions { my $self = shift; my ($dim) = @_; + $dim = "A1:$dim" + if $dim !~ /:/; + my ($topleft, $bottomright) = split ':', $dim; $bottomright = $topleft unless defined $bottomright; ...
Re: The XLSX Perl modules have a simple problem
by Laurent_R (Canon) on Jun 14, 2019 at 22:47 UTC
    Hmm, I very much doubt the XLSX format is wrong. It's being used successfully everyday by hundreds of thousands of people, perhaps more. My take is that you must have misunderstood part of it.
      I haven't read the file format specifications, so I am not making any claims about standard compliance. I do see an increasing number of files uploaded by users, mainly from Russia, where the Perl modules fail because of this small problem.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://11101336]
Approved by haukex
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2021-10-26 17:20 GMT
Find Nodes?
    Voting Booth?
    My first memorable Perl project was:

    Results (90 votes). Check out past polls.