http://www.perlmonks.org?node_id=1016257


in reply to Re^2: parse only one sheet at time In Spreadsheet::ParseExcel
in thread parse only one sheet at time In Spreadsheet::ParseExcel

Thing is, it *does* parse the entire document to memory and yet it doesn't :). It loads the binary OLE-object and parses it. This creates a bit of memory overhead. However, this is (much) less memory than Spreadsheet::ParseExcel uses.
The difference is in the fact that it doesn't *keep* your entire document in memory. As soon as you've read a row or sheet, it is removed from memory. There's also another bunch of things that it doesn't do with data you haven't read yet from the stream, but I don't know the details of exactly what all that is.
Bottom line: Spreadsheet::ParseExcel::Stream is not perfect, but it's a whole lot better concerning memory usage compared to Spreadsheet::ParseExcel.
  • Comment on Re^3: parse only one sheet at time In Spreadsheet::ParseExcel

Replies are listed 'Best First'.
Re^4: parse only one sheet at time In Spreadsheet::ParseExcel
by Kenosis (Priest) on Feb 01, 2013 at 00:57 UTC

    A memory benchmarking of the two modules supports better memory usage by Spreadsheet::ParseExcel::Stream for the task below on a 1.9M SS, 20 sheets, each having 500 x 26 cells filled:

    use strict; use warnings; use Memchmark qw(cmpthese); use Spreadsheet::ParseExcel; use Spreadsheet::ParseExcel::Stream; my $xls_file = 'Book1.xls'; cmpthese( Spreadsheet_ParseExcel_Stream => sub { my $xls = Spreadsheet::ParseExcel::Stream->new($xls_file); while ( my $sheet = $xls->sheet() ) { my $cellA1 = $sheet->row->[0]; } }, Spreadsheet_ParseExcel => sub { my $parser = Spreadsheet::ParseExcel->new(); my $workbook = $parser->parse($xls_file); for my $worksheet ( $workbook->worksheets() ) { my $cellA1 = $worksheet->get_cell( 0, 0 )->value; } } );

    Results:

    test: Spreadsheet_ParseExcel, memory used: 199147520 bytes test: Spreadsheet_ParseExcel_Stream, memory used: 17633280 bytes

    As a side note, Spreadsheet::ParseExcel::Stream is a front end for Spreadsheet::ParseExcel, and its author asserts that its memory management is optimized.