Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Out of Memory while using Spreadsheet::XLSX

by Anonymous Monk
on Jul 10, 2014 at 12:10 UTC ( [id://1093045]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm using Spreadsheet::XLSX to parse the xlsx file. for around 20000 rows , the memory it takes around 800 MB, though size of file is 5 MB. Can anybody helps me to optimize it? following is the code:

my ($self, $in_file, $out_file) = @_; $count = @_; $count -= 1; if(2 != $count) { util->logit("[$PARENT_PROC_NAME]: xlsx_to_csv:invalid args [@_ +] [$count]"); return $FAILURE; } my $excel = Spreadsheet::XLSX -> new ($in_file); my $line; my $out_csv_name = $out_file; my $orig_out_file = $out_csv_name; #binmode CSV; binmode CSV, ":utf8"; my $Csv = Text::CSV_XS->new({ 'quote_char' => '"', 'escape_char' => '"', 'sep_char' => ',', 'binary' => 1, }); my $i = 1; foreach my $sheet (@{$excel -> {Worksheet}}) { $i++; $out_csv_name = "$orig_out_file$underscore$sheet->{Name}.csv"; #$out_csv_name =~ s/\s+//g; @ret_array[$i] = $out_csv_name; open CSV, "> $out_csv_name" || die "Cannot create csv file: $! +" ; $sheet -> {MaxRow} ||= $sheet -> {MinRow}; foreach my $row ($sheet -> {MinRow} .. $sheet -> {MaxRow}) { my @Row; $sheet -> {MaxCol} ||= $sheet -> {MinCol}; foreach my $col ($sheet -> {MinCol} .. $sheet -> {MaxCol} +) { my $cell = $sheet -> {Cells} [$row] [$col]; my $Value = ""; if ($cell) { #$Value = $cell->Value; #rounds off #$Value = $cell->{_Value}; #rounds off #$Value = $cell->Value(); #rounds off #$Value = $cell->{Val}; $Value = $cell->unformatted(); #my $avl = $sheet->{Cells}[$row][$col]{Val}; #print "avl:[$avl]\n"; #$Value = $cell->value(); if ($Value eq 'GENERAL') { $Value = $cell->{Val}; } } last if $col == $sheet->{MinCol} and !$Value; push(@Row, $Value); } next unless @Row; my $Status = $Csv->combine(@Row); if (!defined $Status) { my $Error = $Csv->error_input(); warn "ERROR FOUND!: $Error"; } if (defined $Status) { my $Line = $Csv->string(); print CSV "$Line\n";

I don't know how to do it...I'm new to perl. Kindly guide me.

Replies are listed 'Best First'.
Re: Out of Memory while using Spreadsheet::XLSX
by sn1987a (Deacon) on Jul 10, 2014 at 12:43 UTC
    ,p>I don't know about Spreadsheet::XLSX but Spreadsheet::ParseExcel, on which it is modeled, has the ability to specify a cell handler. This avoids the need to store the entire parsed spreadsheet. You just the extract the data you need as it is being parsed.

    Its use in controling memory usage is discussed in the documentation at Reducing-the-memory-usage-of-Spreadsheet::ParseExcel of Spreadsheet::ParseExcel

    Update: Added a little more expanation

        Thanks for the update runrig. I was unable to check that at the time I made my response.
Re: Out of Memory while using Spreadsheet::XLSX
by runrig (Abbot) on Jul 10, 2014 at 16:45 UTC
    You don't say whether you run out of memory (1) as soon as the file is parsed at:
    my $excel = Spreadsheet::XLSX->new($in_file);

    ..or (2) if you are running out of memory while you are saving every row of each sheet to an array.

    If it's the first case, then there was a start at writing a new XLSX parser, but work on it has stopped. It does work, though it is in a sort of alpha state, so you can try that.

    If it's the second case, then don't do that.

      Hi all, glad to have all the responses from all you guys :). I'm using ParserExcel for .xls files but this is unable to parse .xlsx files. So I'm using Spreadsheet::XLSX. I have decided to use this module after a lot of R&D, this was the only way I found out which worked for me, but after testing this on large files it takes hell lot of memory for files of 5MB. Is there any way to minimize it as there is cell_handler in ParseExcel??

        Hi runring, I'm really so sorry I didn't get what you said...Could you please explain it?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1093045]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2024-04-20 03:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found