http://www.perlmonks.org?node_id=1014852


in reply to Re: Reducing the memory usage of Spreadsheet::ParseExcel
in thread Reducing the memory usage of Spreadsheet::ParseExcel

I'm not sure if .xls files are compressed as well, but I like to keep the memory-footprint of my applications small. If the same amount of data requires (much) less memory when loaded from other types of file (fixed length files, csv files, mysql database), then I'm going to complain when .xls needs more :)
  • Comment on Re^2: Reducing the memory usage of Spreadsheet::ParseExcel

Replies are listed 'Best First'.
Re^3: Reducing the memory usage of Spreadsheet::ParseExcel
by Anonymous Monk on Jan 23, 2013 at 08:00 UTC

    then I'm going to complain when .xls needs more :)

    MS-Excel itself probably needs that much

      I've used Spreadsheet::ParseExcel::Stream instead of Spreadsheet::ParseExcel and it uses 1.0GB memory after parsing an xls-file with a single worksheet and 64772 rows and 120 columns. This is before I go through the parsed worksheets and rows.
      Opening this in Excel (2007) seems to use only 230MB.
      So even with this module, reading a somewhat large Excel file uses excessively large amounts of memory. Edit: The xls-file is 122MB.
      print("Before creating new Spreadsheet::ParseExcel::Stream:\n" . `free + -m`); my $ExcelParser = Spreadsheet::ParseExcel::Stream->new($FileName); print("After creating new Spreadsheet::ParseExcel::Stream:\n" . `free +-m`);
      results in
      Before creating new Spreadsheet::ParseExcel::Stream: total used free shared buffers cac +hed Mem: 3011 1250 1761 0 2 +177 -/+ buffers/cache: 1071 1940 Swap: 956 177 779 After creating new Spreadsheet::ParseExcel::Stream: total used free shared buffers cac +hed Mem: 3011 2257 753 0 2 +177 -/+ buffers/cache: 2077 933 Swap: 956 177 779
        ... it uses 1.0GB memory after parsing an xls-file with a single worksheet and 64772 rows and 120 columns.

        Yes, the claim of "no memory overhead" is a bit of a mistake/exaggeration. But you would get the same result using just Spreadsheet::ParseExcel with the CellHandler and NotSetCell options (which S::PE correctly says "reduces" memory overhead, not has "no" memory overhead). Spreadsheet::ParseExcel parses and saves a bunch of stuff before it even gets to the cell data. So the S::PE with the CellHandler/NotSetCells options and/or S::PE::Stream only save you from storing the cell data in memory. I don't know if it would be possible to tell S::PE to not save the metadata before the cell data (or possible to modify it to implement such a thing). Only jmcnamara would know for sure..

        Spreadsheet::ParseExcel::Stream uses Coro, that stuff could be buggy (or being used buggily)