Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^3: Reducing the memory usage of Spreadsheet::ParseExcel

by Anonymous Monk
on Jan 23, 2013 at 08:00 UTC ( #1014856=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Reducing the memory usage of Spreadsheet::ParseExcel
in thread Reducing the memory usage of Spreadsheet::ParseExcel

then I'm going to complain when .xls needs more :)

MS-Excel itself probably needs that much


Comment on Re^3: Reducing the memory usage of Spreadsheet::ParseExcel
Re^4: Reducing the memory usage of Spreadsheet::ParseExcel
by Neighbour (Friar) on Jan 30, 2013 at 11:18 UTC
    I've used Spreadsheet::ParseExcel::Stream instead of Spreadsheet::ParseExcel and it uses 1.0GB memory after parsing an xls-file with a single worksheet and 64772 rows and 120 columns. This is before I go through the parsed worksheets and rows.
    Opening this in Excel (2007) seems to use only 230MB.
    So even with this module, reading a somewhat large Excel file uses excessively large amounts of memory. Edit: The xls-file is 122MB.
    print("Before creating new Spreadsheet::ParseExcel::Stream:\n" . `free + -m`); my $ExcelParser = Spreadsheet::ParseExcel::Stream->new($FileName); print("After creating new Spreadsheet::ParseExcel::Stream:\n" . `free +-m`);
    results in
    Before creating new Spreadsheet::ParseExcel::Stream: total used free shared buffers cac +hed Mem: 3011 1250 1761 0 2 +177 -/+ buffers/cache: 1071 1940 Swap: 956 177 779 After creating new Spreadsheet::ParseExcel::Stream: total used free shared buffers cac +hed Mem: 3011 2257 753 0 2 +177 -/+ buffers/cache: 2077 933 Swap: 956 177 779
      Spreadsheet::ParseExcel::Stream uses Coro, that stuff could be buggy (or being used buggily)
        Well, it is an improvement over Spreadsheet::ParseExcel.
        Before creating new Spreadsheet::ParseExcel and parsing file: total used free shared buffers cac +hed Mem: 3011 345 2666 0 4 +185 -/+ buffers/cache: 154 2857 Swap: 956 182 774 Killed
        (For those not familiar with *nix, when a machine runs out of memory, the kernel starts killing processes to free some. Usually, the first process to go is the one that claimed the most, like in this case my testprogram that tries to parse the xls-file).
        From dmesg:
        [10998050.203177] Out of memory: Kill process 32189 (testinterfaces.) +score 676 or sacrifice child [10998050.203180] Killed process 32189 (testinterfaces.) total-vm:2893 +916kB, anon-rss:1958024kB, file-rss:0kB
      ... it uses 1.0GB memory after parsing an xls-file with a single worksheet and 64772 rows and 120 columns.

      Yes, the claim of "no memory overhead" is a bit of a mistake/exaggeration. But you would get the same result using just Spreadsheet::ParseExcel with the CellHandler and NotSetCell options (which S::PE correctly says "reduces" memory overhead, not has "no" memory overhead). Spreadsheet::ParseExcel parses and saves a bunch of stuff before it even gets to the cell data. So the S::PE with the CellHandler/NotSetCells options and/or S::PE::Stream only save you from storing the cell data in memory. I don't know if it would be possible to tell S::PE to not save the metadata before the cell data (or possible to modify it to implement such a thing). Only jmcnamara would know for sure..

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1014856]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (16)
As of 2014-08-22 19:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (163 votes), past polls