Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Out of memory problem in Windows XP 32-bit: only 1GB occupied

by blanes (Initiate)
on Jan 31, 2013 at 17:11 UTC ( #1016362=perlquestion: print w/ replies, xml ) Need Help??
blanes has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have been looking for an answer to my question but I cannot quite find it anywhere...

I have code that loads an Excel XLSX file into memory and then analyzes its content. However, when I load a very large file I run into the "Out of memory" error. Checking the Windows Task Manager I see that Perl is only using around 1GB and that there is memory left since my computer has 4GB of RAM. I have tried using the modules Spreadsheet::XLSX and SimpleXlsx but both run into the same issue. My question is: Is this 1GB limit a problem with Perl? Has anybody found the same issue? Is there a parameter that can be changed as the -Xmx in Java? A 32-bit OS should be able to handle all 4GB.

I have tried my code with both ActivePerl (version 5.16.2) and Strawberry Perl (version 5.16.2.1-32bit). My OS is Windows XP Professional 32-bit.

This is how I read the file with one module:

use SimpleXlsx; my($xlsx) = SimpleXlsx->new(); my($worksheets) = $xlsx->parse($file_name);

With the other module:

use Spreadsheet::XLSX; $excel = Spreadsheet::XLSX -> new ($file_name);

Comment on Out of memory problem in Windows XP 32-bit: only 1GB occupied
Select or Download Code
Re: Out of memory problem in Windows XP 32-bit: only 1GB occupied
by BrowserUk (Pope) on Jan 31, 2013 at 18:10 UTC
    A 32-bit OS should be able to handle all 4GB.

    That's not so. 32-bit windows only make 2GB of virtual memory available to each process; the other 2GB is reserved for use by the OS.

    That can be extended to 3GB using a startup switch, but then executables need to be re-built with a link time option to make use of it.

    "Out of memory" error. Checking the Windows Task Manager I see that Perl is only using around 1GB.

    It sounds like the program is trying to extend a hash that currently has consuming at or close to 1GB of data; When perl needs to extend a hash, it doubles the size of the base element; that may well be pushing you over the 2GB limit; hence you run out of memory.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Out of memory problem in Windows XP 32-bit: only 1GB occupied
by dasgar (Deacon) on Jan 31, 2013 at 19:10 UTC

    Out of curiosity, since you're on Windows, do you happen to have Excel installed?

    If so, you could always try using Win32::OLE to control Excel for your manipulation of the Excel file. Of course, I don't think that I personally have ever tried using Excel to open a 1 GB sized file before.

Re: Out of memory problem in Windows XP 32-bit: only 1GB occupied
by sundialsvc4 (Abbot) on Jan 31, 2013 at 22:36 UTC

    I would concur with both of the above two opinions.   Certainly, in my humble, the overwhelming “10,000 pound elephant” here in this closet, is the fact that this Excel file is anywhere close to 1 gigabyte in size.   This spreadsheet is trying to be “a database” when what it ought to be doing is to reference an external data-source, i.e. a database.   The present state-of-things has “seriously outgrown its breeches,” as such things so often do, such that Perl is an innocent bystander.

    Then, in any case, I think that it should be pushing the responsibility over to “Excel, itself,” by means of OLE.

    Very soon now, though, this problem is going to need to be dealt with at its true source:   a single spreadsheet file was never truly intended (IMHO ...) to single-handedly deal with that much data.   Excel does have excellent facilities for handling external data sources, and SQLite (which contains everything in a single external file and does not require a separate server) ... or even MS-Access (ditto)... are among those available sources.   This company has IMHO waited much too long to deal with this issue already.   It will be very pleased with the results when it has “crossed that line” ... as in, “what took us so long?”

    Perl, truly, is not to blame here.   Go and solve this problem properly, strictly in Excel terms, and it will be gone for good.   Don’t waste time trying to monkey with Perl in this case...

Re: Out of memory problem in Windows XP 32-bit: only 1GB occupied
by blanes (Initiate) on Feb 01, 2013 at 14:00 UTC

    Thanks! I appreciate the insights.

    The original Excel XLSX file is only 123MB but it has one spreadsheet with 1 million rows. As I would expect, the Perl modules use more memory than the XLSX file size, probably due to hashes.

    I understand that it is a very large file for Excel but that is what the client is bringing.

    I'm leaning towards reading the file with another language (Java maybe) and convert it into text and then read it with Perl. Convoluted unfortunately. I will also check the Win32::OLE suggestion.

      IIRC XLSX format is actually compressed (zipped) XML. While your file may show as 123MB as a compressed file, working with it's contents will require more memory.

      As marto points out, .xlsx files are compressed XML, so consider unzipping it and processing it with XML:Twig or similar.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      Outside of your script, are there plans for a user to open this Excel spreadsheet using Excel? If so, you may be close to or exceeding the max number of rows for a worksheet depending on which version of Excel that you're using.

      Pulling up the help in my Excel 2010, I found the following information:

      "In Excel 2010 and Excel 2007, the worksheet size is 16,384 columns by 1,048,576 rows, but the worksheet size of Excel 97-2003 is only 256 columns by 65,536 rows. Data in cells outside of this column and row limit is lost in Excel 97-2003."

      If there's any potential for the number of rows to significantly increase, it sounds like you'll have issues opening the file in Excel. Just wanted to point out this limitation that your exceeding (Excel 97-2003) or getting close to exceeding (Excel 2007, 2010).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1016362]
Approved by talexb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (7)
As of 2014-09-30 21:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (384 votes), past polls