Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

ParseExcel difficulties

by hill (Sexton)
on Mar 05, 2011 at 01:32 UTC ( [id://891531]=perlquestion: print w/replies, xml ) Need Help??

hill has asked for the wisdom of the Perl Monks concerning the following question:

My life is controlled by many others--a common complaint, I'm sure. I would like to convert a series of 'xls' files to 'csv' files. I've been successful on Windows machines but some of the controlling individuals have decreed that we're transitioning to Linux. After a bit of exploration, I grabbed Spreadsheet::ParseExcel from CPAN and, still on Windows, processed a single Excel file smoothly. Solved, I thought. The next file (much less complex than the first) produced an "out of memory" error--more poking found this bit (blatantly swiped from another post) that addressed the "out of memory" problem:
#!/usr/bin/perl use strict; use Spreadsheet::ParseExcel; my $parse_excel = Spreadsheet::ParseExcel->new(CellHandler => \&cell_handler, NotSetCell => 1); my $workbook = $parse_excel->Parse('22 JAN 2011 CRAM CFC1.xls'); sub cell_handler { my $workbook = $_[0]; my $sheet_index = $_[1]; my $row = $_[2]; my $col = $_[3]; my $cell = $_[4]; # Do something useful with the formatted cell value print $cell->{Val}, "\n"; }
Then things got too strange for this Perl duffer. The offending file (79kb) is data from another group. I can't control how it's formatted. If I save the file as a 'csv' file, open that and re-save it in the 'xls' format, the script shrinks significantly and works properly. If I save the original file as another 'xls' file its size grows to 93kb and I get a completely different error. And, if that weren't strange enough, I have another, similar file (similar size) from the same group that produces another error and it shuts down the perl interpreter. So, am I struggling with a ParseExcel problem or, what seems more likely to me, a problem with Excel itself? Or just simple ineptitude? Always a possibility. Many thanks for your attention.

Replies are listed 'Best First'.
Re: ParseExcel difficulties
by GrandFather (Saint) on Mar 05, 2011 at 01:53 UTC

    Spreadsheet::ParseExcel does not use Excel in any way so ultimately the problem is not an Excel issue (even if Excel is generating rubbish output). About the only way forward I can think of is to use Excel to create subsets of the problematic file in an attempt to isolate the issue to as small a test .xls file as possible that still demonstrates the issue then submit a bug report. In the process you may discover the construct in the spreadsheet that is causing grief and maybe can work around it.

    It may help to add logging to cell_handler so you can see what the last cell was before things went pear shaped and can maybe diagnose the problem that way.

    Another possibility is that your cell_handler code has an interesting bug in it leading to recursion (for example) or allocation of a huge data structure. It may help to trim your cell_handler code to a minimum that demonstrates the problem then post that.

    True laziness is hard work
Re: ParseExcel difficulties
by jmcnamara (Monsignor) on Mar 05, 2011 at 10:12 UTC
    Try the latest version of Spreadsheet::ParseExcel which has a parser error() method that may report some useful information:

    ... my $parser = Spreadsheet::ParseExcel->new(); my $workbook = $parser->parse('Book1.xls'); if ( !defined $workbook ) { die $parser->error(), ".\n"; } ...

    It may be that you are trying to parse an encrypted file or an xlsx file.

    That memory reduction technique is also included in the main documentation now.

    --
    John.

Re: ParseExcel difficulties
by davies (Prior) on Mar 05, 2011 at 11:35 UTC

    If you are doing this on a machine that has Excel available, I showed in Re^3: Win32::OLE Excel search and replace commas how to get Excel itself to save as sheet as a CSV using Win32:OLE to control Excel directly rather than ParseExcel.

    Regards,

    John Davies

    Update: fixed minor typo
Re: ParseExcel difficulties
by Monkomatic (Sexton) on Mar 05, 2011 at 06:27 UTC

    Maybe your overthinking the problem.

    You could just try converting all the data from the xls format and separate it by a common non-used deliminator like :::

    data1:::data2:::data3:::

    Then just do a split on all the data and reassign in whatever format you need in the csv

    @stringtempsplit = split(/:::/, $stringtemp);

    ","@stringtempsplit3"."@stringtempsplit[0]","@stringtempsplit4"," etc

    Hope that helps..

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://891531]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (8)
As of 2024-04-23 09:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found