http://www.perlmonks.org?node_id=1016035


in reply to Re^3: Reducing the memory usage of Spreadsheet::ParseExcel
in thread Reducing the memory usage of Spreadsheet::ParseExcel

I've used Spreadsheet::ParseExcel::Stream instead of Spreadsheet::ParseExcel and it uses 1.0GB memory after parsing an xls-file with a single worksheet and 64772 rows and 120 columns. This is before I go through the parsed worksheets and rows.
Opening this in Excel (2007) seems to use only 230MB.
So even with this module, reading a somewhat large Excel file uses excessively large amounts of memory. Edit: The xls-file is 122MB.
print("Before creating new Spreadsheet::ParseExcel::Stream:\n" . `free + -m`); my $ExcelParser = Spreadsheet::ParseExcel::Stream->new($FileName); print("After creating new Spreadsheet::ParseExcel::Stream:\n" . `free +-m`);
results in
Before creating new Spreadsheet::ParseExcel::Stream: total used free shared buffers cac +hed Mem: 3011 1250 1761 0 2 +177 -/+ buffers/cache: 1071 1940 Swap: 956 177 779 After creating new Spreadsheet::ParseExcel::Stream: total used free shared buffers cac +hed Mem: 3011 2257 753 0 2 +177 -/+ buffers/cache: 2077 933 Swap: 956 177 779

Replies are listed 'Best First'.
Re^5: Reducing the memory usage of Spreadsheet::ParseExcel
by runrig (Abbot) on Jan 30, 2013 at 17:50 UTC
    ... it uses 1.0GB memory after parsing an xls-file with a single worksheet and 64772 rows and 120 columns.

    Yes, the claim of "no memory overhead" is a bit of a mistake/exaggeration. But you would get the same result using just Spreadsheet::ParseExcel with the CellHandler and NotSetCell options (which S::PE correctly says "reduces" memory overhead, not has "no" memory overhead). Spreadsheet::ParseExcel parses and saves a bunch of stuff before it even gets to the cell data. So the S::PE with the CellHandler/NotSetCells options and/or S::PE::Stream only save you from storing the cell data in memory. I don't know if it would be possible to tell S::PE to not save the metadata before the cell data (or possible to modify it to implement such a thing). Only jmcnamara would know for sure..

Re^5: Reducing the memory usage of Spreadsheet::ParseExcel
by Anonymous Monk on Jan 30, 2013 at 12:31 UTC
    Spreadsheet::ParseExcel::Stream uses Coro, that stuff could be buggy (or being used buggily)
      Well, it is an improvement over Spreadsheet::ParseExcel.
      Before creating new Spreadsheet::ParseExcel and parsing file: total used free shared buffers cac +hed Mem: 3011 345 2666 0 4 +185 -/+ buffers/cache: 154 2857 Swap: 956 182 774 Killed
      (For those not familiar with *nix, when a machine runs out of memory, the kernel starts killing processes to free some. Usually, the first process to go is the one that claimed the most, like in this case my testprogram that tries to parse the xls-file).
      From dmesg:
      [10998050.203177] Out of memory: Kill process 32189 (testinterfaces.) +score 676 or sacrifice child [10998050.203180] Killed process 32189 (testinterfaces.) total-vm:2893 +916kB, anon-rss:1958024kB, file-rss:0kB

        Well, it is an improvement over Spreadsheet::ParseExcel

        But it uses Spreadsheet::ParseExcel, but it also uses Coro