http://www.perlmonks.org?node_id=1016558


in reply to Out of memory problem in Windows XP 32-bit: only 1GB occupied

Thanks! I appreciate the insights.

The original Excel XLSX file is only 123MB but it has one spreadsheet with 1 million rows. As I would expect, the Perl modules use more memory than the XLSX file size, probably due to hashes.

I understand that it is a very large file for Excel but that is what the client is bringing.

I'm leaning towards reading the file with another language (Java maybe) and convert it into text and then read it with Perl. Convoluted unfortunately. I will also check the Win32::OLE suggestion.

Replies are listed 'Best First'.
Re^2: Out of memory problem in Windows XP 32-bit: only 1GB occupied
by marto (Cardinal) on Feb 01, 2013 at 14:10 UTC

    IIRC XLSX format is actually compressed (zipped) XML. While your file may show as 123MB as a compressed file, working with it's contents will require more memory.

Re^2: Out of memory problem in Windows XP 32-bit: only 1GB occupied
by BrowserUk (Patriarch) on Feb 01, 2013 at 14:17 UTC

    As marto points out, .xlsx files are compressed XML, so consider unzipping it and processing it with XML:Twig or similar.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: Out of memory problem in Windows XP 32-bit: only 1GB occupied
by dasgar (Priest) on Feb 01, 2013 at 14:21 UTC

    Outside of your script, are there plans for a user to open this Excel spreadsheet using Excel? If so, you may be close to or exceeding the max number of rows for a worksheet depending on which version of Excel that you're using.

    Pulling up the help in my Excel 2010, I found the following information:

    "In Excel 2010 and Excel 2007, the worksheet size is 16,384 columns by 1,048,576 rows, but the worksheet size of Excel 97-2003 is only 256 columns by 65,536 rows. Data in cells outside of this column and row limit is lost in Excel 97-2003."

    If there's any potential for the number of rows to significantly increase, it sounds like you'll have issues opening the file in Excel. Just wanted to point out this limitation that your exceeding (Excel 97-2003) or getting close to exceeding (Excel 2007, 2010).