http://www.perlmonks.org?node_id=994817

cibien has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I make this perl code (convert excel files in one xml file), it works well but is too much slow, 4 minute to create the xml file (12mb). Can you help me to modify this perl code to speed up the conversion? thanks :) Andrea
#!/usr/bin/perl -w ######################### my $version="0.0.4"; # 2012.04.12 v0.0.1 DM: Creation # 2012.04.23 v0.0.2 DM: Added support for active column and comma s +epareted families # 2012.07.30 v0.0.3 DM: Added support for comma ',' or point '.' mu +ltiple separated propriety - (only one multiple propriety cell for li +ne); # Optimize and relived the perl code removing unused +perl module: (File::Find - Spreadsheet::ParseExcel - Switch); # 2012.08.28 v0.0.4 DM: Added support for ONE, TWO or THREE comm +a (or point) separated propriety for excel line #Excel format rules: #-a merged cell named "Configurator mapping" on the first line identif +ies the mapping data (internal+properties) #-a merged cell named "Brand" on the first line identifies the Brand d +ata (optional) #-a merged cell named "Market" on the first line identifies the Market + data (optional) #-a merged cell named "Customer" on the first line identifies the Cust +omer data (optional) #-the first column of "Configurator mapping" should have an "internal" + as cell(line number is not important), the "internal" define the tit +le bar line; the cell must be on the same line where all the properti +es/markets/brand/customer names are #-the "active" column must be exactly left of the "internal" column #-the commercial code column must be exacly left of the "active" colum +n #-the "pr_family" column should be exactly right of the "internal" col +umn #-the tecnical description must be exactly right of the last "Configur +ator mapping" column #-the colorzones must be exactly right of the last "tecnical descripti +on" column #Notes: #-in the field "pr_family" the user can concatenate the family value l +ike 'TL1_1,TL1_4' , every value generate a line in the xml file with +the right family #-in the colums "propriety" the user can concatenate the propriety val +ue like 'value1,value2' (one two or three pr colum for excel line), e +very value add a line in the xml file with the right propriety #-if the "active" column is not empty, the line is not evaluated #-if "internal" or "pr_family" field is empty, the line is not evaluat +ed # ######################### use Encode; use utf8; use XML::LibXML; use Spreadsheet::ParseExcel; system("cls"); # Read command line arguments #--------------------------------------------------------------------- +------------------- my $materialmapping_file = shift; my $data_folder = shift; print "Material mapping generator - Version 0.0.4 \nmaterialmapping_fi +le:$materialmapping_file\ndata_folder:$data_folder\n"; # subroutine sub getSQLTimeStamp { my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst)=localt +ime(time); return sprintf "%4d-%02d-%02d %02d:%02d:%02d",$year+1900,$mon+1 +,$mday,$hour,$min,$sec; } # Main #--------------------------------------------------------------------- +------------------- unless (defined $materialmapping_file and defined $data_folder) { print "Usage: $0 <materialmapping_file> <data_folder>\n"; exit; } opendir ( DIR, $data_folder ) || die "Error in opening dir $data_f +older\n"; my $materialmapping_table_xml = XML::LibXML->createDocument( "1.0" +, "UTF-8"); my $materialmapping_table_xml_root = $materialmapping_table_xml->c +reateElement("masterdata"); $materialmapping_table_xml_root->setAttribute('version',getSQLTime +Stamp()); $materialmapping_table_xml->setDocumentElement($materialmapping_ta +ble_xml_root); print "loading...\n"; while( ($filename = readdir(DIR))){ next unless $filename =~ /\.xls$/i; my $parser = Spreadsheet::ParseExcel->new(); my $oBook = $parser->parse($data_folder.$filename); next unless defined $oBook; for(my $iSheet=0; $iSheet < $oBook->{SheetCount} ; $iS +heet++) { $oWkS = $oBook->{Worksheet}[$iSheet]; # find the needed columns my $map_Cmin = -1; # the first is internal my $map_Cmax = -1; my $artno_col = -1; my $active_col = -1; my $colorzones_col = -1; my $family_col = -1; my $tec_desc_col = -1; my $brand_Cmin = -1; my $brand_Cmax = -1; my $customer_Cmin = -1; my $customer_Cmax = -1; my $market_Cmin = -1; my $market_Cmax = -1; my $title_row = -1; for (my $iC = $oWkS->{MinCol}; defined $oWkS->{Max +Col} && $iC <= $oWkS->{MaxCol} ; $iC++) { $oWkC = $oWkS->{Cells}[$oWkS->{MinRow}][$iC]; if(defined $oWkC) { if(decode('cp1252',$oWkC->{Val}) eq "Confi +gurator mapping") { for (my $iR = $oWkS->{MinRow} +1; defi +ned $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR++) { $oWkC_int = $oWkS->{Cells}[$iR][$i +C]; if(defined $oWkC_int and decode('c +p1252',$oWkC_int->{Val}) eq "internal") { $title_row = $iR; } } if($title_row > 0) { foreach my $area ( @{ $oWkS->{Merg +edArea} } ) { if($area->[1] eq $iC and $area +->[0] eq $oWkS->{MinRow}){ $map_Cmax = $area->[3]; } } $map_Cmin = $iC; $artno_col = $map_Cmin -2; $active_col = $map_Cmin -1; $family_col = $map_Cmin +1; $colorzones_col = $map_Cmax +2; $tec_desc_col = $map_Cmax +1; } } elsif(decode('cp1252',$oWkC->{Val}) eq " +Market"){ foreach my $area ( @{ $oWkS->{MergedAr +ea} } ) { if($area->[1] eq $iC and $area->[0 +] eq $oWkS->{MinRow}){ $market_Cmax = $area->[3]; } } $market_Cmin = $iC; } elsif(decode('cp1252',$oWkC->{Val}) eq " +Customer"){ foreach my $area ( @{ $oWkS->{MergedAr +ea} } ) { if($area->[1] eq $iC and $area->[0 +] eq $oWkS->{MinRow}){ $customer_Cmax = $area->[3]; } } $customer_Cmin = $iC; } elsif(decode('cp1252',$oWkC->{Val}) eq " +Brand"){ foreach my $area ( @{ $oWkS->{MergedAr +ea} } ) { if($area->[1] eq $iC and $area->[0 +] eq $oWkS->{MinRow}){ $brand_Cmax = $area->[3]; } } $brand_Cmin = $iC; } } } if($map_Cmin >= 0 and $map_Cmin >= 0 and $title_ro +w >= 0){ for(my $iR = $title_row +1; defined $oWkS->{Ma +xRow} && $iR <= $oWkS->{MaxRow} ; $iR++){ $internal_cell = $oWkS->{Cells}[$iR][$map_ +Cmin]; $active_cell = $oWkS->{Cells}[$iR][$active +_col]; $family_cell = $oWkS->{Cells}[$iR][$family +_col]; if(defined $internal_cell and defined $act +ive_cell and defined $family_cell and length(decode('cp1252',$interna +l_cell->{Val})) gt 0 and !(length(decode('cp1252',$active_cell->{Val} +)) gt 0) and length(decode('cp1252',$family_cell->{Val})) gt 0) { #fo every family print a line my @families = split(/[,.]+/, decode(' +cp1252',$family_cell->{Val})); foreach $family(@families){ + my $materialmapping_item = $materialmapping_table_xml->createElemen +t("item"); + + #set family $materialmapping_item->setAttribut +e("pr_family",$family); #set item_number $item_number_cell = $oWkS->{Cells} +[$iR][$artno_col]; if(defined $item_number_cell and l +ength(decode('cp1252',$item_number_cell->{Val})) gt 0) { $materialmapping_item->setAttr +ibute("item_number",decode('cp1252',$item_number_cell->{Val})); } + #add all propriety for(my $prC = $map_Cmin+2; $prC <= + $map_Cmax ; $prC++) { $pr_cell_name = $oWkS->{Ce +lls}[$title_row][$prC]; $pr_cell = $oWkS->{Cells}[ +$iR][$prC]; if(defined $pr_cell and de +fined $pr_cell_name and length(decode('cp1252',$pr_cell->{Val})) gt 0 + and length(decode('cp1252',$pr_cell_name->{Val})) gt 0) { + my @pr = split(/[,.]+/, de +code('cp1252',$pr_cell->{Val})); + foreach $pr(@pr){ + + $materialmapping_item->setAttribute(decode( +'cp1252',$pr_cell_name->{Val}),$pr); + } + } + } + #for every multiple less 2 propriety print a line + for(my $prC2 = $map_Cmin+2; $prC2 <= $map_Cmax ; $prC2++) { $pr_cell_name2 = $oWkS->{C +ells}[$title_row][$prC2]; $pr_cell2 = $oWkS->{Cells} +[$iR][$prC2]; + if(defined $pr_cell2 and defined $pr_cell_name2 and + length(decode('cp1252',$pr_cell2->{Val})) gt 0 and length(decode('cp +1252',$pr_cell_name2->{Val})) gt 0) { my @pr2 = split(/[,.]+/, d +ecode('cp1252',$pr_cell2->{Val})); + foreach $pr2(@pr2){ + + if(@pr2 gt 1) + { + + my $materialmapping_item2 = $materialmapp +ing_table_xml->createElement("item"); + #set family $materialmapping_item2->setAttribu +te("pr_family",$family); #set item_number $item_number_cell = $oWkS->{Cells} +[$iR][$artno_col]; if(defined $item_number_cell and l +ength(decode('cp1252',$item_number_cell->{Val})) gt 0) { $materialmapping_item2->setAtt +ribute("item_number",decode('cp1252',$item_number_cell->{Val})); } + + for(my $prC3 = $map_Cmin+2; $prC3 <= $map +_Cmax ; $prC3++) + { + $pr_cell_name3 = $oWkS->{C +ells}[$title_row][$prC3]; + $pr_cell3 = $oWkS->{Cells} +[$iR][$prC3]; + if(defined $pr_cell3 and +defined $pr_cell_name3 and length(decode('cp1252',$pr_cell3->{Val})) +gt 0 and length(decode('cp1252',$pr_cell_name3->{Val})) gt 0) + { + if(($prC3 ne $prC2)) + { + my @pr3 = spl +it(/[,.]+/, decode('cp1252',$pr_cell3->{Val})); + foreach $pr3 +(@pr3) + { + + $materialmapping_item2->setAttribute(decode('cp1252',$pr_cell_name3 +->{Val}),$pr3); + } + } + } + } + + #set internal $materialmapping_item2->setAttribu +te("internal",decode('cp1252',$internal_cell->{Val})); #set colorzones $colorzones_cell = $oWkS->{Cells}[ +$iR][$colorzones_col]; if(defined $colorzones_cell and le +ngth(decode('cp1252',$colorzones_cell->{Val})) gt 0) { $materialmapping_item2->setAtt +ribute("colorzone",decode('cp1252',$colorzones_cell->{Val})); } #set description $description_cell = $oWkS->{Cells} +[$iR][$tec_desc_col]; if(defined $description_cell and l +ength(decode('cp1252',$description_cell->{Val})) gt 0) { $materialmapping_item2->setAtt +ribute("description",decode('cp1252',$description_cell->{Val})); } #set Brands if($brand_Cmin gt 0 and $brand_Cma +x gt 0) { my $brand_string = ""; my $ignored_brand_string = ""; for(my $brandC = $brand_Cmin; +$brandC <= $brand_Cmax ; $brandC++) { $brand_cell_name = $oWkS-> +{Cells}[$title_row][$brandC]; if(defined $brand_cell_nam +e and decode('cp1252',$brand_cell_name->{Val}) ne "Epta std") { $brand_cell = $oWkS->{ +Cells}[$iR][$brandC]; if(defined $brand_cell + and length(decode('cp1252',$brand_cell->{Val})) gt 0) { if(decode('cp1252' +,$brand_cell->{Val}) eq '0') { $ignored_brand +_string = $ignored_brand_string . decode('cp1252',$brand_cell_name->{ +Val}) . ","; } else { $brand_string += $brand_string . decode('cp1252',$brand_cell_name->{Val}) . ","; } } } } if(length($brand_string) gt 0) +{ $materialmapping_item2->se +tAttribute('brand',$brand_string); } if(length($ignored_brand_strin +g) gt 0){ $materialmapping_item2->se +tAttribute('ignore_brand',$ignored_brand_string); } } #set Customer if($customer_Cmin gt 0 and $custom +er_Cmax gt 0) { my $customer_string = ""; my $ignored_customer_string = +""; for(my $customerC = $customer_ +Cmin; $customerC <= $customer_Cmax ; $customerC++) { $customer_cell_name = $oWk +S->{Cells}[$title_row][$customerC]; $customer_cell = $oWkS->{C +ells}[$iR][$customerC]; if(defined $customer_c +ell_name and defined $customer_cell and length(decode('cp1252',$custo +mer_cell->{Val})) gt 0 and length(decode('cp1252',$customer_cell_name +->{Val})) gt 0) { if(decode('cp1252' +,$customer_cell->{Val}) eq '0') { $ignored_custo +mer_string = $ignored_customer_string . decode('cp1252',$customer_cel +l_name->{Val}) . ","; } else { $customer_stri +ng = $customer_string . decode('cp1252',$customer_cell_name->{Val}) . + ","; } } } if(length($customer_string) gt + 0){ $materialmapping_item2->se +tAttribute('customer',$customer_string); } if(length($ignored_customer_st +ring) gt 0){ $materialmapping_item2->se +tAttribute('ignore_customer',$ignored_customer_string); } } #set Market if($market_Cmin gt 0 and $market_C +max gt 0) { my $market_string = ""; my $ignored_market_string = "" +; for(my $marketC = $market_Cmin +; $marketC <= $market_Cmax ; $marketC++) { $market_cell_name = $oWkS- +>{Cells}[$title_row][$marketC]; $market_cell = $oWkS->{Cel +ls}[$iR][$marketC]; if(defined $market_cel +l_name and defined $market_cell and length(decode('cp1252',$market_ce +ll->{Val})) gt 0 and length(decode('cp1252',$market_cell_name->{Val}) +) gt 0) { if(decode('cp1252' +,$market_cell->{Val}) eq '0') { $ignored_marke +t_string = $ignored_market_string . decode('cp1252',$market_cell_name +->{Val}) . ","; } else { $market_string + = $market_string . decode('cp1252',$market_cell_name->{Val}) . ","; } } } if(length($market_string) gt 0 +){ $materialmapping_item2->se +tAttribute('market',$market_string); } if(length($ignored_market_stri +ng) gt 0){ $materialmapping_item2->se +tAttribute('ignore_market',$ignored_market_string); + } } + if (($prC2 = $pr_cell2)){ { + + $materialmapping_item = $materialmapping_item2; + $materialmapping_table_xml_root->addChild($materialmapping_item2); + + $materialmapping_item2->setAttribute(decode('cp1252',$pr_cell_name2 +->{Val}),$pr2); + + } +} + } } } } + + + #for every duble propriety print a line + for(my $prC2 = $map_Cmin+2; $prC2 <= $map_Cmax ; $prC2++) { $pr_cell_name2 = $oWkS->{C +ells}[$title_row][$prC2]; $pr_cell2 = $oWkS->{Cells} +[$iR][$prC2]; if(defined $pr_cell2 and d +efined $pr_cell_name2 and length(decode('cp1252',$pr_cell2->{Val})) g +t 0 and length(decode('cp1252',$pr_cell_name2->{Val})) gt 0) { my @pr2 = split(/[,.]+/, d +ecode('cp1252',$pr_cell2->{Val})); + foreach $pr2(@pr2){ + if(@pr2 gt 1) + { + + + for(my $prC3 = $map_Cmin+2; $prC3 <= $map +_Cmax ; $prC3++) + { + $pr_cell_name3 = $oWkS->{C +ells}[$title_row][$prC3]; + $pr_cell3 = $oWkS->{Cells} +[$iR][$prC3]; + + if(defined $pr_cell3 and d +efined $pr_cell_name3 and length(decode('cp1252',$pr_cell3->{Val})) g +t 0 and length(decode('cp1252',$pr_cell_name3->{Val})) gt 0) { + if($prC3 gt $prC2) + { + my @pr3 = spl +it(/[,.]+/, decode('cp1252',$pr_cell3->{Val})); + foreach $pr3 +(@pr3) + { + + + if(@p +r3 gt 1) + { + + my $materialmapping_item3 = $materialmapping_table_xml->createEleme +nt("item"); + + + #se +t family $materialmapping_item3->setAttribu +te("pr_family",$family); #set item_number $item_number_cell = $oWkS->{Cells} +[$iR][$artno_col]; if(defined $item_number_cell and l +ength(decode('cp1252',$item_number_cell->{Val})) gt 0) { $materialmapping_item3->setAtt +ribute("item_number",decode('cp1252',$item_number_cell->{Val})); } + + + + + + + $mater +ialmapping_item3->setAttribute(decode('cp1252',$pr_cell_name3->{Val}) +,$pr3); + + + for(my $prC +4 = $map_Cmin+2; $prC4 <= $map_Cmax ; $prC4++) + { + $pr_cell_name4 = $oWkS->{C +ells}[$title_row][$prC4]; + $pr_cell4 = $oWkS->{Cells} +[$iR][$prC4]; + if($prC4 ne $prC3) + { + my @pr4 = spl +it(/[,.]+/, decode('cp1252',$pr_cell4->{Val})); + foreach $pr4 +(@pr4) + { + + + + $materialmapping_item3->setAttribute(deco +de('cp1252',$pr_cell_name4->{Val}),$pr4); + + } + }} + $materialmapping_item3->setAttribute(decode('cp1252',$pr_ce +ll_name2->{Val}),$pr2); + + #set internal $materialmapping_item3->setAttribu +te("internal",decode('cp1252',$internal_cell->{Val})); #set colorzones $colorzones_cell = $oWkS->{Cells}[ +$iR][$colorzones_col]; if(defined $colorzones_cell and le +ngth(decode('cp1252',$colorzones_cell->{Val})) gt 0) { $materialmapping_item3->setAtt +ribute("colorzone",decode('cp1252',$colorzones_cell->{Val})); } #set description $description_cell = $oWkS->{Cells} +[$iR][$tec_desc_col]; if(defined $description_cell and l +ength(decode('cp1252',$description_cell->{Val})) gt 0) { $materialmapping_item3->setAtt +ribute("description",decode('cp1252',$description_cell->{Val})); } #set Brands if($brand_Cmin gt 0 and $brand_Cma +x gt 0) { my $brand_string = ""; my $ignored_brand_string = ""; for(my $brandC = $brand_Cmin; +$brandC <= $brand_Cmax ; $brandC++) { $brand_cell_name = $oWkS-> +{Cells}[$title_row][$brandC]; if(defined $brand_cell_nam +e and decode('cp1252',$brand_cell_name->{Val}) ne "Epta std") { $brand_cell = $oWkS->{ +Cells}[$iR][$brandC]; if(defined $brand_cell + and length(decode('cp1252',$brand_cell->{Val})) gt 0) { if(decode('cp1252' +,$brand_cell->{Val}) eq '0') { $ignored_brand +_string = $ignored_brand_string . decode('cp1252',$brand_cell_name->{ +Val}) . ","; } else { $brand_string += $brand_string . decode('cp1252',$brand_cell_name->{Val}) . ","; } } } } if(length($brand_string) gt 0) +{ $materialmapping_item3->se +tAttribute('brand',$brand_string); } if(length($ignored_brand_strin +g) gt 0){ $materialmapping_item3->se +tAttribute('ignore_brand',$ignored_brand_string); } } #set Customer if($customer_Cmin gt 0 and $custom +er_Cmax gt 0) { my $customer_string = ""; my $ignored_customer_string = +""; for(my $customerC = $customer_ +Cmin; $customerC <= $customer_Cmax ; $customerC++) { $customer_cell_name = $oWk +S->{Cells}[$title_row][$customerC]; $customer_cell = $oWkS->{C +ells}[$iR][$customerC]; if(defined $customer_c +ell_name and defined $customer_cell and length(decode('cp1252',$custo +mer_cell->{Val})) gt 0 and length(decode('cp1252',$customer_cell_name +->{Val})) gt 0) { if(decode('cp1252' +,$customer_cell->{Val}) eq '0') { $ignored_custo +mer_string = $ignored_customer_string . decode('cp1252',$customer_cel +l_name->{Val}) . ","; } else { $customer_stri +ng = $customer_string . decode('cp1252',$customer_cell_name->{Val}) . + ","; } } } if(length($customer_string) gt + 0){ $materialmapping_item3->se +tAttribute('customer',$customer_string); } if(length($ignored_customer_st +ring) gt 0){ $materialmapping_item3->se +tAttribute('ignore_customer',$ignored_customer_string); } } #set Market if($market_Cmin gt 0 and $market_C +max gt 0) { my $market_string = ""; my $ignored_market_string = "" +; for(my $marketC = $market_Cmin +; $marketC <= $market_Cmax ; $marketC++) { $market_cell_name = $oWkS- +>{Cells}[$title_row][$marketC]; $market_cell = $oWkS->{Cel +ls}[$iR][$marketC]; if(defined $market_cel +l_name and defined $market_cell and length(decode('cp1252',$market_ce +ll->{Val})) gt 0 and length(decode('cp1252',$market_cell_name->{Val}) +) gt 0) { if(decode('cp1252' +,$market_cell->{Val}) eq '0') { $ignored_marke +t_string = $ignored_market_string . decode('cp1252',$market_cell_name +->{Val}) . ","; } else { $market_string + = $market_string . decode('cp1252',$market_cell_name->{Val}) . ","; } } } if(length($market_string) gt 0 +){ $materialmapping_item3->se +tAttribute('market',$market_string); } if(length($ignored_market_stri +ng) gt 0){ $materialmapping_item3->se +tAttribute('ignore_market',$ignored_market_string); + } } + + $materialmapping_item = $materialmapping_item3; + $materialmapping_table_xml_root->addChild($materialmapping_ +item3); +} } } } } } } } } + + + #for every triple propriety print a line + for(my $prC2 = $map_Cmin+2; $prC2 <= $map_Cmax ; $prC2++) { $pr_cell_name2 = $oWkS->{C +ells}[$title_row][$prC2]; $pr_cell2 = $oWkS->{Cells} +[$iR][$prC2]; if(defined $pr_cell2 and d +efined $pr_cell_name2 and length(decode('cp1252',$pr_cell2->{Val})) g +t 0 and length(decode('cp1252',$pr_cell_name2->{Val})) gt 0) { my @pr2 = split(/[,.]+/, d +ecode('cp1252',$pr_cell2->{Val})); + foreach $pr2(@pr2){ + if(@pr2 gt 1) + { + + + for(my $prC3 = $map_Cmin+2; $prC3 <= $map +_Cmax ; $prC3++) + { + $pr_cell_name3 = $oWkS->{C +ells}[$title_row][$prC3]; + $pr_cell3 = $oWkS->{Cells} +[$iR][$prC3]; + + if(defined $pr_cell3 and d +efined $pr_cell_name3 and length(decode('cp1252',$pr_cell3->{Val})) g +t 0 and length(decode('cp1252',$pr_cell_name3->{Val})) gt 0) { + if($prC3 gt $prC2) + { + my @pr3 = spl +it(/[,.]+/, decode('cp1252',$pr_cell3->{Val})); + foreach $pr3 +(@pr3) + { + + + if(@p +r3 gt 1) + { + + + + for(my $prC +4 = $map_Cmin+2; $prC4 <= $map_Cmax ; $prC4++) + { + $pr_cell_name4 = $oWkS->{C +ells}[$title_row][$prC4]; + $pr_cell4 = $oWkS->{Cells} +[$iR][$prC4]; + if(($prC4 gt $prC3)&&($prC +4 gt $prC2)) + { + my @pr4 = spl +it(/[,.]+/, decode('cp1252',$pr_cell4->{Val})); + foreach $pr4 +(@pr4) + { + + if(@pr4 gt 1){ + my $ma +terialmapping_item4 = $materialmapping_table_xml->createElement("item +"); + + $mate +rialmapping_item4->setAttribute("pr_family",$family); #set item_number $item_number_cell = $oWkS->{Cells} +[$iR][$artno_col]; if(defined $item_number_cell and l +ength(decode('cp1252',$item_number_cell->{Val})) gt 0) { $materialmapping_item4->setAtt +ribute("item_number",decode('cp1252',$item_number_cell->{Val})); } + + + + + $materi +almapping_item4->setAttribute(decode('cp1252',$pr_cell_name4->{Val}), +$pr4); + + $materialmapping_item4->setAttribute(decode('cp1252',$pr_ce +ll_name3->{Val}),$pr3); + $materialmapping_item4->setAttribute(dec +ode('cp1252',$pr_cell_name2->{Val}),$pr2); + + for(my $prC5 = $map_Cmin+2; $prC5 <= $map_Cmax ; $prC5++) + { + $pr_cell_name5 = $oWkS->{C +ells}[$title_row][$prC5]; + $pr_cell5 = $oWkS->{Cells} +[$iR][$prC5]; + if(($prC5 ne $prC3)&&($prC +5 ne $prC2)&&($prC5 ne $prC4)) + { + my @pr5 = spl +it(/[,.]+/, decode('cp1252',$pr_cell5->{Val})); + foreach $pr5 +(@pr5) + { + + + + $materi +almapping_item4->setAttribute(decode('cp1252',$pr_cell_name5->{Val}), +$pr5); + + }}} + + #set internal $materialmapping_item4->setAttribu +te("internal",decode('cp1252',$internal_cell->{Val})); #set colorzones $colorzones_cell = $oWkS->{Cells}[ +$iR][$colorzones_col]; if(defined $colorzones_cell and le +ngth(decode('cp1252',$colorzones_cell->{Val})) gt 0) { $materialmapping_item4->setAtt +ribute("colorzone",decode('cp1252',$colorzones_cell->{Val})); } #set description $description_cell = $oWkS->{Cells} +[$iR][$tec_desc_col]; if(defined $description_cell and l +ength(decode('cp1252',$description_cell->{Val})) gt 0) { $materialmapping_item4->setAtt +ribute("description",decode('cp1252',$description_cell->{Val})); } #set Brands if($brand_Cmin gt 0 and $brand_Cma +x gt 0) { my $brand_string = ""; my $ignored_brand_string = ""; for(my $brandC = $brand_Cmin; +$brandC <= $brand_Cmax ; $brandC++) { $brand_cell_name = $oWkS-> +{Cells}[$title_row][$brandC]; if(defined $brand_cell_nam +e and decode('cp1252',$brand_cell_name->{Val}) ne "Epta std") { $brand_cell = $oWkS->{ +Cells}[$iR][$brandC]; if(defined $brand_cell + and length(decode('cp1252',$brand_cell->{Val})) gt 0) { if(decode('cp1252' +,$brand_cell->{Val}) eq '0') { $ignored_brand +_string = $ignored_brand_string . decode('cp1252',$brand_cell_name->{ +Val}) . ","; } else { $brand_string += $brand_string . decode('cp1252',$brand_cell_name->{Val}) . ","; } } } } if(length($brand_string) gt 0) +{ $materialmapping_item4->se +tAttribute('brand',$brand_string); } if(length($ignored_brand_strin +g) gt 0){ $materialmapping_item4->se +tAttribute('ignore_brand',$ignored_brand_string); } } #set Customer if($customer_Cmin gt 0 and $custom +er_Cmax gt 0) { my $customer_string = ""; my $ignored_customer_string = +""; for(my $customerC = $customer_ +Cmin; $customerC <= $customer_Cmax ; $customerC++) { $customer_cell_name = $oWk +S->{Cells}[$title_row][$customerC]; $customer_cell = $oWkS->{C +ells}[$iR][$customerC]; if(defined $customer_c +ell_name and defined $customer_cell and length(decode('cp1252',$custo +mer_cell->{Val})) gt 0 and length(decode('cp1252',$customer_cell_name +->{Val})) gt 0) { if(decode('cp1252' +,$customer_cell->{Val}) eq '0') { $ignored_custo +mer_string = $ignored_customer_string . decode('cp1252',$customer_cel +l_name->{Val}) . ","; } else { $customer_stri +ng = $customer_string . decode('cp1252',$customer_cell_name->{Val}) . + ","; } } } if(length($customer_string) gt + 0){ $materialmapping_item4->se +tAttribute('customer',$customer_string); } if(length($ignored_customer_st +ring) gt 0){ $materialmapping_item4->se +tAttribute('ignore_customer',$ignored_customer_string); } } #set Market if($market_Cmin gt 0 and $market_C +max gt 0) { my $market_string = ""; my $ignored_market_string = "" +; for(my $marketC = $market_Cmin +; $marketC <= $market_Cmax ; $marketC++) { $market_cell_name = $oWkS- +>{Cells}[$title_row][$marketC]; $market_cell = $oWkS->{Cel +ls}[$iR][$marketC]; if(defined $market_cel +l_name and defined $market_cell and length(decode('cp1252',$market_ce +ll->{Val})) gt 0 and length(decode('cp1252',$market_cell_name->{Val}) +) gt 0) { if(decode('cp1252' +,$market_cell->{Val}) eq '0') { $ignored_marke +t_string = $ignored_market_string . decode('cp1252',$market_cell_name +->{Val}) . ","; } else { $market_string + = $market_string . decode('cp1252',$market_cell_name->{Val}) . ","; } } } if(length($market_string) gt 0 +){ $materialmapping_item4->se +tAttribute('market',$market_string); } if(length($ignored_market_stri +ng) gt 0){ $materialmapping_item4->se +tAttribute('ignore_market',$ignored_market_string); + } } + + $materialmapping_item = $materialmapping_item4; + $materialmapping_table_xml_root->addChild($materialmapping_ +item4); +} } } } } } } } } } } } } + + #set internal $materialmapping_item->setAttribut +e("internal",decode('cp1252',$internal_cell->{Val})); #set colorzones $colorzones_cell = $oWkS->{Cells}[ +$iR][$colorzones_col]; if(defined $colorzones_cell and le +ngth(decode('cp1252',$colorzones_cell->{Val})) gt 0) { $materialmapping_item->setAttr +ibute("colorzone",decode('cp1252',$colorzones_cell->{Val})); } #set description $description_cell = $oWkS->{Cells} +[$iR][$tec_desc_col]; if(defined $description_cell and l +ength(decode('cp1252',$description_cell->{Val})) gt 0) { $materialmapping_item->setAttr +ibute("description",decode('cp1252',$description_cell->{Val})); } #set Brands if($brand_Cmin gt 0 and $brand_Cma +x gt 0) { my $brand_string = ""; my $ignored_brand_string = ""; for(my $brandC = $brand_Cmin; +$brandC <= $brand_Cmax ; $brandC++) { $brand_cell_name = $oWkS-> +{Cells}[$title_row][$brandC]; if(defined $brand_cell_nam +e and decode('cp1252',$brand_cell_name->{Val}) ne "Epta std") { $brand_cell = $oWkS->{ +Cells}[$iR][$brandC]; if(defined $brand_cell + and length(decode('cp1252',$brand_cell->{Val})) gt 0) { if(decode('cp1252' +,$brand_cell->{Val}) eq '0') { $ignored_brand +_string = $ignored_brand_string . decode('cp1252',$brand_cell_name->{ +Val}) . ","; } else { $brand_string += $brand_string . decode('cp1252',$brand_cell_name->{Val}) . ","; } } } } if(length($brand_string) gt 0) +{ $materialmapping_item->set +Attribute('brand',$brand_string); } if(length($ignored_brand_strin +g) gt 0){ $materialmapping_item->set +Attribute('ignore_brand',$ignored_brand_string); } } #set Customer if($customer_Cmin gt 0 and $custom +er_Cmax gt 0) { my $customer_string = ""; my $ignored_customer_string = +""; for(my $customerC = $customer_ +Cmin; $customerC <= $customer_Cmax ; $customerC++) { $customer_cell_name = $oWk +S->{Cells}[$title_row][$customerC]; $customer_cell = $oWkS->{C +ells}[$iR][$customerC]; if(defined $customer_c +ell_name and defined $customer_cell and length(decode('cp1252',$custo +mer_cell->{Val})) gt 0 and length(decode('cp1252',$customer_cell_name +->{Val})) gt 0) { if(decode('cp1252' +,$customer_cell->{Val}) eq '0') { $ignored_custo +mer_string = $ignored_customer_string . decode('cp1252',$customer_cel +l_name->{Val}) . ","; } else { $customer_stri +ng = $customer_string . decode('cp1252',$customer_cell_name->{Val}) . + ","; } } } if(length($customer_string) gt + 0){ $materialmapping_item->set +Attribute('customer',$customer_string); } if(length($ignored_customer_st +ring) gt 0){ $materialmapping_item->set +Attribute('ignore_customer',$ignored_customer_string); } } #set Market if($market_Cmin gt 0 and $market_C +max gt 0) { my $market_string = ""; my $ignored_market_string = "" +; for(my $marketC = $market_Cmin +; $marketC <= $market_Cmax ; $marketC++) { $market_cell_name = $oWkS- +>{Cells}[$title_row][$marketC]; $market_cell = $oWkS->{Cel +ls}[$iR][$marketC]; if(defined $market_cel +l_name and defined $market_cell and length(decode('cp1252',$market_ce +ll->{Val})) gt 0 and length(decode('cp1252',$market_cell_name->{Val}) +) gt 0) { if(decode('cp1252' +,$market_cell->{Val}) eq '0') { $ignored_marke +t_string = $ignored_market_string . decode('cp1252',$market_cell_name +->{Val}) . ","; } else { $market_string + = $market_string . decode('cp1252',$market_cell_name->{Val}) . ","; } } } if(length($market_string) gt 0 +){ $materialmapping_item->set +Attribute('market',$market_string); } if(length($ignored_market_stri +ng) gt 0){ $materialmapping_item->set +Attribute('ignore_market',$ignored_market_string); + } } + + $materialmapping_table_xml_root->addChild($materialmapping_item); + }; } } } } print "Imported: $filename \n"; } $materialmapping_table_xml->toFile($materialmapping_file,2);

Replies are listed 'Best First'.
Re: Speed up perl code
by grizzley (Chaplain) on Sep 21, 2012 at 06:58 UTC
    You hope that people here will analyze 1000 lines of code? Maybe it's time for you to use Devel::NYTProf?
Re: Speed up perl code
by bulk88 (Priest) on Sep 21, 2012 at 08:54 UTC
    Use B::Concise. Look at how many opcodes each line takes. For example,
    if(decode('cp1252',$oWkC->{Val}) eq "Confi +gurator mapping") { for (my $iR = $oWkS->{MinRow} +1; defi +ned $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR++)
    MinRow and MaxRow hash slices should be copied to lexical my scalars. my scalars are faster than a hash reference, and you have a loop there.
    if(defined $market_cel +l_name and defined $market_cell and length(decode('cp1252',$market_ce +ll->{Val})) gt 0 and length(decode('cp1252',$market_cell_name->{Val}) +) gt 0) { if(decode('cp1252' +,$market_cell->{Val}) eq '0') { $ignored_marke +t_string = $ignored_market_string . decode('cp1252',$market_cell_name +->{Val}) . ","; } else { $market_string + = $market_string . decode('cp1252',$market_cell_name->{Val}) . ","; } } }
    You wrote "decode('cp1252',$market_cell_name->{Val})" three times (more memory for opcodes) and ran it twice. Why are you decoding just for '0'? '0' is the same in 1252 as in Latin 1. Dont do a decode just to check for '0', or do the decode ONCE and cache the decoded result. IDK if "eq ''" or your "length() > 0" is faster. Use NYTProf and find out. I've got a question, was this written by hand or is the output of a script generating tool?
    my $market_Cmin = -1; my $market_Cmax = -1; my $title_row = -1; for (my $iC = $oWkS->{MinCol}; defined $oWkS->{Max +Col} && $iC <= $oWkS->{MaxCol} ; $iC++)
    IDK the API of the modules you are using, but is "$oWkS->{MaxCol}" really going to stop being defined at some sudden point in the loop? undef converts to 0 in perl.
    } elsif(decode('cp1252',$oWkC->{Val}) eq "Market"){ foreach my $area ( @{ $oWkS->{MergedAr +ea} } ) { if($area->[1] eq $iC and $area->[0 +] eq $oWkS->{MinRow}){ $market_Cmax = $area->[3]; } } $market_Cmin = $iC; } elsif(decode('cp1252',$oWkC->{Val}) eq " +Customer"){ foreach my $area ( @{ $oWkS->{MergedAr +ea} } ) { if($area->[1] eq $iC and $area->[0 +] eq $oWkS->{MinRow}){ $customer_Cmax = $area->[3]; } } $customer_Cmin = $iC; } elsif(decode('cp1252',$oWkC->{Val}) eq " +Brand"){ foreach my $area ( @{ $oWkS->{MergedAr +ea} } ) { if($area->[1] eq $iC and $area->[0 +] eq $oWkS->{MinRow}){ $brand_Cmax = $area->[3]; } } $brand_Cmin = $iC; }
    You need to make functions and not call decode('cp1252',$oWkC->{Val}) a dozen times in a row. Also, if you know you will find $*****_Cmax only once (currently the *last* Cmax that matches the condition is selected, IDK if you require this), add last to all your foreach loops so they exit after the first match rather than continuing to iterate through all the remaining data.
    for(my $iR = $title_row +1; defined $oWkS->{MaxRow} && $iR <= +$oWkS->{MaxRow} ; $iR++){ $internal_cell = $oWkS->{Cells}[$iR][$map_ +Cmin]; $active_cell = $oWkS->{Cells}[$iR][$active +_col]; $family_cell = $oWkS->{Cells}[$iR][$active +_col];
    Copy the $oWkS->{Cells}$iR array reference to a my variable, then do map/active/family on that 1 my variable, not do 2 extra dereferences every line. You can also do ($internal_cell, $active_cell, $family_cell) = $oWkS->{Cells}[$iR][$map_Cmin, $active_col, $active_col]; and save more overhead/opcodes.
    $oWkC = $oWkS->{Cells}[$oWkS->{MinRow}][$iC]; if(defined $oWkC) { if(decode('cp1252',$oWkC->{Val}) eq "Confi +gurator mapping") {
    Combine the 2 ifs with &&, you saved 1-3 opcodes. The "after" look is visually messy but
    my $map_Cmin = -1; # the first is internal my $map_Cmax = -1; my $artno_col = -1; my $active_col = -1; my $colorzones_col = -1; my $family_col = -1; my $tec_desc_col = -1; my $brand_Cmin = -1; my $brand_Cmax = -1; my $customer_Cmin = -1; my $customer_Cmax = -1; my $market_Cmin = -1; my $market_Cmax = -1; my $title_row = -1;
    this can be improved in speed at expense of visual clarity. Put all that in one ";" terminated statement. Short example,  my ($map_Cmax,$artno_col, $active_col)  = (-1, -1, -1);. Use whitespace and tabs to lay out the variable names in columns to try and organize that one long my statement.

    Someone in the past brought up your "string" gt 0 lines. IDK if they are right. For example,
    #set Brands if($brand_Cmin gt 0 and $brand_Cma +x gt 0) { my $brand_string = ""; my $ignored_brand_string = ""; for(my $brandC = $brand_Cmin; +$brandC <= $brand_Cmax ; $brandC++)
    $brand_Cmax has to be a number, not a string for this to work.
    #set Brands if($brand_Cmin gt 0 and $brand_Cma +x gt 0) { my $brand_string = ""; my $ignored_brand_string = ""; for(my $brandC = $brand_Cmin; +$brandC <= $brand_Cmax ; $brandC++) { $brand_cell_name = $oWkS-> +{Cells}[$title_row][$brandC]; if(defined $brand_cell_nam +e and decode('cp1252',$brand_cell_name->{Val}) ne "Epta std") { $brand_cell = $oWkS->{ +Cells}[$iR][$brandC]; if(defined $brand_cell + and length(decode('cp1252',$brand_cell->{Val})) gt 0) { if(decode('cp1252' +,$brand_cell->{Val}) eq '0') { $ignored_brand +_string = $ignored_brand_string . decode('cp1252',$brand_cell_name->{ +Val}) . ","; } else { $brand_string += $brand_string . decode('cp1252',$brand_cell_name->{Val}) . ","; } } } }
    considered adding a "last" to this numeric for loop?
    if(length($brand_string) gt 0){ $materialmapping_item2->se +tAttribute('brand',$brand_string); } if(length($ignored_brand_strin +g) gt 0){ $materialmapping_item2->se +tAttribute('ignore_brand',$ignored_brand_string); }
    I don't remember if a "something() if $condition;" is faster/less ops in Concise than new block opening ifs. I would switch then to no block ifs.

    Edit: fixed module name

      The script is writed by hand. thank you for your very esauxtive answer. I have used your adwice, but I understand that it speding a lot of time here:

       while( ($filename = readdir(DIR))){

      the module Spreadsheet::ParseExcel spend a lot of time to open the excel file... Is fast to open light .xls files, and very slow to open big (more than 4mb) .xls files I think is nothing to do. anyway thankyou very much

        Yes, it is spending much time in that loop, because 90% of your code is there.

        You need to divide your code into subroutines, preferably small ones, to get a reasonable analysis from a profiler.

Re: Speed up perl code
by fluffyvoidwarrior (Monk) on Sep 21, 2012 at 07:55 UTC
    Lots of code.

    Just a few generic suggestions that you've probably thought of anyway
    Some jobs are actually big and do take a long time. However it's easy to make a job big that shouldn't be. I would suggest going back to basics for a few hours and pondering the underlying design of your program. Have you intrinsically designed it wrong? Is there a better approach to the same goal? Are you not processing in a smart way, perhaps multiplying the work as your program runs rather than reducing it. Its easy to tend towards infinite with something that's OK on a small dataset but multiplies up exponentially when you feed it large data.

    Assuming you can only go down the route you have already taken:-
    Have you specifically located the bottlenecks? Maybe when you do you should then benchmark chunks of code and run repeatedly with modifications.

    You may find that you need to write your own optimised code for the cutting edge to your particular problem, rather than use a generic cpan module.

    My guess is that you'll get a lot more help on here if you locate your problem areas, present much smaller code snippets and ask "How can I speed this up?"

    Either way, there probably isn't much point in speculating 'till you've located the bottleneck(s). Possible quick fix : if it's thrashing your hard drive move it to a ramdisk.

    Check this out:
    http://makepp.sourceforge.net/1.50/perl_performance.html

Re: Speed up perl code
by Anonymous Monk on Sep 21, 2012 at 07:18 UTC

    You're wrote  decode( 'cp1252' entirely too many times (131 versus 24 unique, that is 107 duplicates )

    Multiply that by the number of iterations through the 42 odd loops, and no wonder its slow

    Also, there are way too many levels of indentation -- longest is 13 deep -- you should turn each for loop into a function call , really

Re: Speed up perl code
by Anonymous Monk on Sep 21, 2012 at 06:55 UTC

    4min on what kind of machine?

      Came in at just under 3 (hours) on my Commodore64. :P