Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Reducing the memory usage of Spreadsheet::ParseExcel

by cibien (Novice)
on Jan 21, 2013 at 12:54 UTC ( #1014438=perlquestion: print w/ replies, xml ) Need Help??
cibien has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl Monks, I am very novice in Perl and I found a perl code and I try to speed up it. The perl code parse excel files to one file xml and it work well, but is very slow for large files(10MB). here is write how Reducing the memory usage of Spreadsheet::ParseExcel. http://www.perlmonks.org/index.pl?node_id=379743 But I'am too much novice in perl, can you help me to modify the code? There is another way to speed up this perl code? here is the code:

#!/usr/bin/perl -w use Encode; use utf8; use XML::LibXML; use Spreadsheet::ParseExcel; system("cls"); # Read command line arguments #--------------------------------------------------------------------- +------------------- my $materialmapping_file = shift; my $data_folder = shift; print "Material mapping generator - Version 0.0.5 - Final \nmaterialma +pping_file:$materialmapping_file\ndata_folder:$data_folder\n"; # subroutine sub getSQLTimeStamp { my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst)=localt +ime(time); return sprintf "%4d-%02d-%02d %02d:%02d:%02d",$year+1900,$mon+1 +,$mday,$hour,$min,$sec; } # Main #--------------------------------------------------------------------- +------------------- unless (defined $materialmapping_file and defined $data_folder) { print "Usage: $0 <materialmapping_file> <data_folder>\n"; exit; } opendir ( DIR, $data_folder ) || die "Error in opening dir $data_f +older\n"; my $materialmapping_table_xml = XML::LibXML->createDocument( "1.0" +, "UTF-8"); my $materialmapping_table_xml_root = $materialmapping_table_xml->c +reateElement("masterdata"); $materialmapping_table_xml_root->setAttribute('version',getSQLTime +Stamp()); $materialmapping_table_xml->setDocumentElement($materialmapping_ta +ble_xml_root); print "loading...\n"; while(($filename = readdir(DIR))){ next unless $filename =~ /\.xls$/i; print " - $filename \n"; my $parser = Spreadsheet::ParseExcel->new(); my $oBook = $parser->parse($data_folder.$filename); next unless defined $oBook; for(my $iSheet=0; $iSheet < $oBook->{SheetCount} ; $iS +heet++) { $oWkS = $oBook->{Worksheet}[$iSheet]; # find the needed columns my $map_Cmin = -1; # the first is internal my $map_Cmax = -1; my $artno_col = -1; my $active_col = -1; my $colorzones_col = -1; my $family_col = -1; my $tec_desc_col = -1; my $brand_Cmin = -1; my $brand_Cmax = -1; my $customer_Cmin = -1; my $customer_Cmax = -1; my $market_Cmin = -1; my $market_Cmax = -1; my $title_row = -1; for (my $iC = $oWkS->{MinCol}; defined $oWkS->{Max +Col} && $iC <= $oWkS->{MaxCol} ; $iC++) { $oWkC = $oWkS->{Cells}[$oWkS->{MinRow}][$iC]; if(defined $oWkC) { if(decode('cp1252',$oWkC->{Val}) eq "Confi +gurator mapping") { for (my $iR = $oWkS->{MinRow} +1; defi +ned $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR++) { $oWkC_int = $oWkS->{Cells}[$iR][$i +C]; if(defined $oWkC_int and decode('c +p1252',$oWkC_int->{Val}) eq "internal") { $title_row = $iR; } } if($title_row > 0) { foreach my $area ( @{ $oWkS->{Merg +edArea} } ) { if($area->[1] eq $iC and $area +->[0] eq $oWkS->{MinRow}){ $map_Cmax = $area->[3]; } } $map_Cmin = $iC; $artno_col = $map_Cmin -2; $active_col = $map_Cmin -1; $family_col = $map_Cmin +1; $colorzones_col = $map_Cmax +2; $tec_desc_col = $map_Cmax +1; } } elsif(decode('cp1252',$oWkC->{Val}) eq " +Market"){ foreach my $area ( @{ $oWkS->{MergedAr +ea} } ) { if($area->[1] eq $iC and $area->[0 +] eq $oWkS->{MinRow}){ $market_Cmax = $area->[3]; } } $market_Cmin = $iC; } elsif(decode('cp1252',$oWkC->{Val}) eq " +Customer"){ foreach my $area ( @{ $oWkS->{MergedAr +ea} } ) { if($area->[1] eq $iC and $area->[0 +] eq $oWkS->{MinRow}){ $customer_Cmax = $area->[3]; } } $customer_Cmin = $iC; } elsif(decode('cp1252',$oWkC->{Val}) eq " +Brand"){ foreach my $area ( @{ $oWkS->{MergedAr +ea} } ) { if($area->[1] eq $iC and $area->[0 +] eq $oWkS->{MinRow}){ $brand_Cmax = $area->[3]; } } $brand_Cmin = $iC; } } } if($map_Cmin >= 0 and $map_Cmin >= 0 and $title_ro +w >= 0){ for(my $iR = $title_row +1; defined $oWkS->{Ma +xRow} && $iR <= $oWkS->{MaxRow} ; $iR++){ $internal_cell = $oWkS->{Cells}[$iR][$map_ +Cmin]; $active_cell = $oWkS->{Cells}[$iR][$active +_col]; $family_cell = $oWkS->{Cells}[$iR][$family +_col]; if(defined $internal_cell and defined $act +ive_cell and defined $family_cell and length(decode('cp1252',$interna +l_cell->{Val})) gt 0 and !(length(decode('cp1252',$active_cell->{Val} +)) gt 0) and length(decode('cp1252',$family_cell->{Val})) gt 0) { #fo every family print a line my @families = split(/[,.]+/, decode(' +cp1252',$family_cell->{Val})); foreach $family(@families){ my $prnumbero = 0; + my $materialmapping_item = $materialmapping_table_xml->createElemen +t("item"); + + #set family $materialmapping_item->setAttribut +e("pr_family",$family); #set item_number $item_number_cell = $oWkS->{Cells} +[$iR][$artno_col]; if(defined $item_number_cell and l +ength(decode('cp1252',$item_number_cell->{Val})) gt 0) { $materialmapping_item->setAttr +ibute("item_number",decode('cp1252',$item_number_cell->{Val})); } + #add all propriety for(my $prC2 = $map_Cmin+2; $prC2 +<= $map_Cmax ; $prC2++) { $pr_cell_name2 = $oWkS->{C +ells}[$title_row][$prC2]; $pr_cell2 = $oWkS->{Cells} +[$iR][$prC2]; if(defined $pr_cell2 and d +efined $pr_cell_name2 and length(decode('cp1252',$pr_cell2->{Val})) g +t 0 and length(decode('cp1252',$pr_cell_name2->{Val})) gt 0) { + my @pr2 = split(/[,.]+/, d +ecode('cp1252',$pr_cell2->{Val})); + foreach $pr2(@pr2){ + + + $materialmapping_item->setAttribute(decode( +'cp1252',$pr_cell_name2->{Val}),$pr2); + + } + } + } if ($prnumbero eq 0){ + #set internal $materialmapping_item->setAttribut +e("internal",decode('cp1252',$internal_cell->{Val})); #set colorzones $colorzones_cell = $oWkS->{Cells}[ +$iR][$colorzones_col]; if(defined $colorzones_cell and le +ngth(decode('cp1252',$colorzones_cell->{Val})) gt 0) { $materialmapping_item->setAttr +ibute("colorzone",decode('cp1252',$colorzones_cell->{Val})); } #set description $description_cell = $oWkS->{Cells} +[$iR][$tec_desc_col]; if(defined $description_cell and l +ength(decode('cp1252',$description_cell->{Val})) gt 0) { $materialmapping_item->setAttr +ibute("description",decode('cp1252',$description_cell->{Val})); } #set Brands if($brand_Cmin gt 0 and $brand_Cma +x gt 0) { my $brand_string = ""; my $ignored_brand_string = ""; for(my $brandC = $brand_Cmin; +$brandC <= $brand_Cmax ; $brandC++) { $brand_cell_name = $oWkS-> +{Cells}[$title_row][$brandC]; if(defined $brand_cell_nam +e and decode('cp1252',$brand_cell_name->{Val}) ne "Epta std") { $brand_cell = $oWkS->{ +Cells}[$iR][$brandC]; if(defined $brand_cell + and length(decode('cp1252',$brand_cell->{Val})) gt 0) { if(decode('cp1252' +,$brand_cell->{Val}) eq '0') { $ignored_brand +_string = $ignored_brand_string . decode('cp1252',$brand_cell_name->{ +Val}) . ","; } else { $brand_string += $brand_string . decode('cp1252',$brand_cell_name->{Val}) . ","; } } } } if(length($brand_string) gt 0) +{ $materialmapping_item->set +Attribute('brand',$brand_string); } if(length($ignored_brand_strin +g) gt 0){ $materialmapping_item->set +Attribute('ignore_brand',$ignored_brand_string); } } #set Customer if($customer_Cmin gt 0 and $custom +er_Cmax gt 0) { my $customer_string = ""; my $ignored_customer_string = +""; for(my $customerC = $customer_ +Cmin; $customerC <= $customer_Cmax ; $customerC++) { $customer_cell_name = $oWk +S->{Cells}[$title_row][$customerC]; $customer_cell = $oWkS->{C +ells}[$iR][$customerC]; if(defined $customer_c +ell_name and defined $customer_cell and length(decode('cp1252',$custo +mer_cell->{Val})) gt 0 and length(decode('cp1252',$customer_cell_name +->{Val})) gt 0) { if(decode('cp1252' +,$customer_cell->{Val}) eq '0') { $ignored_custo +mer_string = $ignored_customer_string . decode('cp1252',$customer_cel +l_name->{Val}) . ","; } else { $customer_stri +ng = $customer_string . decode('cp1252',$customer_cell_name->{Val}) . + ","; } } } if(length($customer_string) gt + 0){ $materialmapping_item->set +Attribute('customer',$customer_string); } if(length($ignored_customer_st +ring) gt 0){ $materialmapping_item->set +Attribute('ignore_customer',$ignored_customer_string); } } #set Market if($market_Cmin gt 0 and $market_C +max gt 0) { my $market_string = ""; my $ignored_market_string = "" +; for(my $marketC = $market_Cmin +; $marketC <= $market_Cmax ; $marketC++) { $market_cell_name = $oWkS- +>{Cells}[$title_row][$marketC]; $market_cell = $oWkS->{Cel +ls}[$iR][$marketC]; if(defined $market_cel +l_name and defined $market_cell and length(decode('cp1252',$market_ce +ll->{Val})) gt 0 and length(decode('cp1252',$market_cell_name->{Val}) +) gt 0) { if(decode('cp1252' +,$market_cell->{Val}) eq '0') { $ignored_marke +t_string = $ignored_market_string . decode('cp1252',$market_cell_name +->{Val}) . ","; } else { $market_string + = $market_string . decode('cp1252',$market_cell_name->{Val}) . ","; } } } if(length($market_string) gt 0 +){ $materialmapping_item->set +Attribute('market',$market_string); } if(length($ignored_market_stri +ng) gt 0){ $materialmapping_item->set +Attribute('ignore_market',$ignored_market_string); + } } + + $materialmapping_table_xml_root->addChild($materialmapping_item); + } } } } } } } $materialmapping_table_xml->toFile($materialmapping_file,2);

thank you very much and sorry for my english :) Andrea

Comment on Reducing the memory usage of Spreadsheet::ParseExcel
Download Code
Re: Reducing the memory usage of Spreadsheet::ParseExcel
by Corion (Pope) on Jan 21, 2013 at 12:59 UTC
      Yea, tried that..see my scratchpad under Excel reading issues.
      It didn't save nearly enough memory (a 45MB xls still needed around 300MB memory to load) (which you confirmed :P).
      Thanks for answer. yes I looked section on "Reducing the memory usage of Spreadsheet::ParseExcel" but I'am novice in Perl and I don't know how put this code:
      #!/usr/bin/perl -w use strict; use Spreadsheet::ParseExcel; my $parser = Spreadsheet::ParseExcel->new( CellHandler => \&cell_handler, NotSetCell => 1 ); my $workbook = $parser->parse('file.xls'); sub cell_handler { my $workbook = $_[0]; my $sheet_index = $_[1]; my $row = $_[2]; my $col = $_[3]; my $cell = $_[4]; # Do something useful with the formatted cell value print $cell->value(), "\n"; }
      and how adapt it for my code. I have spend a lot of hours but nothing... help me please :(
        Nothing? please help me
Re: Reducing the memory usage of Spreadsheet::ParseExcel
by Anonymous Monk on Jan 21, 2013 at 20:03 UTC
    The 45MB file is compressed on-disk. Why are you particularly alarmed that it might take 300MB to represent in memory .. and, why does such a "small" amount concern you?
      I'm not sure if .xls files are compressed as well, but I like to keep the memory-footprint of my applications small. If the same amount of data requires (much) less memory when loaded from other types of file (fixed length files, csv files, mysql database), then I'm going to complain when .xls needs more :)

        then I'm going to complain when .xls needs more :)

        MS-Excel itself probably needs that much

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1014438]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2014-08-28 04:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (256 votes), past polls