Beefy Boxes and Bandwidth Generously Provided by pair Networks RobOMonk
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Out of memory and While replacements with excel XLSX application

by Anonymous Monk
on Oct 08, 2012 at 08:00 UTC ( #997762=note: print w/ replies, xml ) Need Help??


in reply to Out of memory and While replacements with excel XLSX application

I have a feeling I am copying the data into memory several times, but I do not know how to avoid this.

Rewrite your program into subroutines, use lots of references, and watch the size of your program at each stage

There is a lot of copy/paste duplication in your program that should be rewritten

There are a lot of short variable names, way too many to keep track of, way too many to help you substantially rewrite what you posted

Also you say The total amount of rows is close to 1 million. and that is close to the MAXIMUM LIMIT for Excel of 1,048,576 rows by 16,384 columns

You might switch to DBD::SQLite, see Loading 283600 records (WordNet), Re: Is there a way(maybe tool?) to know the amount of memory an array takes?, dynamic bulk insert in sqlite


Comment on Re: Out of memory and While replacements with excel XLSX application
Re^2: Out of memory and While replacements with excel XLSX application
by Anonymous Monk on Oct 08, 2012 at 09:29 UTC

    Here is a start at refactoring

    I end up with fudge10

    #!C:\strawberry\perl\bin\perl ####Load Packages use strict; use warnings; use Win32::OLE qw(in with); #~ use Win32::OLE::Const 'Microsoft Excel'; use Win32::OLE::Variant; use Win32::OLE::NLS qw(:LOCALE :DATE); use List::MoreUtils qw(natatime); #~ use Excel::Writer::XLSX; ############################################################ ####Initialize Excel Object and Count Files to Process $Win32::OLE::Warn = 3; #die on errors... #get already active Excel application or open new my $Excel = Win32::OLE->GetActiveObject('Excel.Application') || Win32::OLE->new( 'Excel.Application', 'Quit' ); #Get the file names of the production files and prints a status of the + results my $ProdDirect = 'C:/Users/McLovin/Documents/Thesis/Data/'; opendir DH, $ProdDirect or die "Cannot open $ProdDirect: $!"; my @files = grep { !-d } readdir DH; print "@files\n"; closedir DH; my $NumberofFiles = @files; print "Number of files is: $NumberofFiles\n"; ###################################################################### +########## ################################################################# ################Start Processing the Data######################## ################################################################# my $counti; #used to count the excelfiles my $usefile; #The file that is being used my @recID; #an array of ID's for the worksheets my $val; #A test value that is used to test whether a row should b +e included my @right ; #The first of the arrays that will be filled during the process an +d will be further disected later my @prod; #The second array, same as above my @legal; #The third #####Loop over all the files for ( $counti = 0 ; $counti < $NumberofFiles ; $counti++ ) { ##### open Excel file -- This is the input data files print "Working on file $counti of $NumberofFiles\n"; $usefile = $files[$counti]; ###This points to the folder where the excel files are placed my $workfile = "C:/Users/McLovin/Documents/Thesis/Datatal/" . "$us +efile"; print "$workfile\n"; BlahBlahNameHere( $Excel, $workfile, \@recID, \@right, \@legal, \@prod, ); } ###################################################################### +########## #Prints of the created arrays #print "@right\n"; #print "@legal\n"; #print "@prod\n"; ###################################################################### +########## ###################################################################### +############## #the process that eliminates duplicates in the @right array based on c +riteria #makes a copy of the array as it is destroyed in the next while statem +ent, This is maybe one of the problems my @rig = @right; #copy of array my @righ; # a new array that is the result of the while state +ment below my %seen; # a hash that stores agreementnumbers for unique en +tries #The array is in exact sets of 9 strings and i want it spliced a inter +vals of exactly those intervals while ( my ( $m, $n, $o, $p, $q, $r, $s, $t, $y ) = splice( @right, 0, + 9 ) ) { last if $m !~ /^\d{10}$/; next if $seen{$m}++; if ( $n =~ /Specific_region/ ) { if ( $o =~ /NG/ ) { push @righ, $m, $o, $p, $q, $r, $s, $t; } } } # print "@righ\n"; my @leg; #an array that holds the result for the next while statem +ent my @spli; #used as a container for certain entries in the while sta +tement my @joi; #another middle of equation array for picking up results %seen = () ; #emties the previous hash as the uniqueness of entries is also imp +ortant here my @tes; #yet another array for picking up results ###################################################################### +########## #the process that insures that each returned value is printed in the c +orrect #form and coupled with the ten digit number #again the array is organized in 6 values in a row that need to be sep +erated out into rows. while ( my ( $h, $aa, $rr, $j, $k, $l ) = splice( @legal, 0, 6 ) ) { last if $h !~ /^\d{10}$/; if ( $aa =~ /Specific_region/ ) { if ( $rr =~ /NG/ ) { if ( $j =~ /\n/ ) { next if $seen{$h}++; my @spli = split( /\n/, $j ); foreach my $n (@spli) { if ( $n =~ /LSD/ ) { my @tes = split( /LSD/, $n ); foreach my $lon (@tes) { if ( $lon =~ /SEC/ ) { my @joi = split( /-|W|:|\s|,|\(/, $lon + ); my $chans = @joi; my $eleg = join( "", @joi[ 0, 2, 1, $chans - 1 +] ); push @leg, $eleg, $h, $k, $l; } } } elsif ( $n =~ /\(/ ) { my @joi = split( /-|W|:|\s|\(/, $n ); my $chans = @joi; my $eleg = join( "", @joi[ 0, 2, 1, $chans - +2 ] ); push @leg, $eleg, $h, $k, $l; } else { my @joi = split( /-|W|:|\s/, $n ); my $chans = @joi; my $eleg = join( "", @joi[ 0, 2, 1, $chans - +1 ] ); push @leg, $eleg, $h, $k, $l; } } } else { next if $seen{$h}++; my @joi = split( /-|W|:|\s/, $j ); my $chans = @joi; my $eleg = join( "", @joi[ 0, 2, 1, $chans - 1 ] ); push @leg, $eleg, $h, $k, $l; } } } } #print "@leg\n"; my @peg; foreach my $loma (@leg) { if ( $loma =~ /^\d{7}$/ ) { substr( $loma, 6, 0, 0 ); push @peg, $loma; } else { push @peg, $loma; } } ###################################################################### +############## #the process that creates the production array for the entries my @produ; while ( my ( $cp, $aaa, $rrr, $dp, $ep, $fp, $gp, $hp, $ip, $jp, $kp, $lp, $mp, $np, $op, $pp, $qp, $rp, $sp, $tp, $up, $vp, $wp, $yp, $xp ) = splice( @prod, 0, 25 ) ) { last if $cp !~ /^\d{10}$/; if ( $aaa =~ /Specific_region/ ) { if ( $rrr =~ /NG/ ) { unless ( $dp =~ /a specific repeated text for all relevant entr +ies/ ) { #eliminate this if statement for option two, where entries with no act +ual production is included if ( defined($qp) && $qp =~ /\d\d-\d\d-\d{4}/ ) { push @produ, $dp, $ep, $cp, $fp, $gp, $hp, $ip, $j +p, $kp, $lp, $mp, $np, $op, $pp, $qp, $rp, $sp, $tp, $up +, $vp, $wp, $yp, $xp; } } } } } # print "@produ\n"; my @nwells; #an array that collects the results #creates the 4.2 entries "agreements with no wells while ( my ( $mn, $nn, $on, $pn, $qn, $rn, $sn, $tn, $yn ) = splice( @rig, 0, 9 ) ) { last if $mn !~ /^\d{10}$/; if ( $nn =~ /Specific_region/ ) { if ( $on =~ /NG/ ) { if ( $yn =~ /a specific repeated text for all relevant ent +ries/ ) { push @nwells, $mn, $on, $pn, $qn, $rn, $sn, $tn, $yn; } } } } # print "@nwells\n"; #Places results into arrays of arrays for easy computation in excel. u +ses natatime again the arrays # are of a specific size. per row. my @AAR; { my $iter = natatime 7, @righ; while ( my @tmp = $iter->() ) { push @AAR, \@tmp; } } my @BAR; { my $iter = natatime 4, @peg; while ( my @tmp = $iter->() ) { push @BAR, \@tmp; } } my @CAR; { my $iter = natatime 23, @produ; while ( my @tmp = $iter->() ) { push @CAR, \@tmp; } } my @DAR; { my $iter = natatime 8, @nwells; while ( my @tmp = $iter->() ) { push @DAR, \@tmp; } } #####The new excel sheets that should contain the results my $workbooknew = Excel::Writer::XLSX->new('re1.xlsx'); my $worksheetnew = $workbooknew->add_worksheet(); $worksheetnew->keep_leading_zeros(); $worksheetnew->set_column( 'A:G', 30 ); $worksheetnew->write_col( 'A2', \@AAR ); ####This is the data on the legalnumber - introduce keep_leading zeroe +s for correct legal form my $workbooknew1 = Excel::Writer::XLSX->new('re2.xlsx'); my $worksheetnew1 = $workbooknew1->add_worksheet(); $worksheetnew1->keep_leading_zeros(); $worksheetnew1->set_column( 'A:D', 15 ); $worksheetnew1->write_col( 'A2', \@BAR ); # # ####This is the data on the production of the wells - introduce ke +ep_leading zeroes for correct legal form my $workbooknew2 = Excel::Writer::XLSX->new('re3.xlsx'); my $worksheetnew2 = $workbooknew2->add_worksheet(); my $worksheetnew3 = $workbooknew2->add_worksheet(); $worksheetnew2->set_column( 'A:W', 50 ); $worksheetnew2->write_col( 'A2', \@CAR ); $worksheetnew3->set_column( 'A:H', 30 ); $worksheetnew3->write_col( 'A2', \@DAR ); sub ValueTargetCols { my( $sheet, $targets, $rows, $cols ) = @_; for my $col ( @$cols ) { for my $row( @$rows ) { if ( my $val = eval { $sheet->Cells( $row, $col )->{Value} + } ) { for my $target ( @$targets ){ push @{$target}, $val; } } } } return; } sub BlahBlahNameHere { my( $Excel, $workfile, $recID, $right, $legal, $prod ) = @_; my $Book = $Excel->Workbooks->Open($workfile); my $sheetcnt = $Book->Worksheets->Count(); #~ foreach my $r ( 1 .. $sheetcnt ) { { my $Sheet = $Book->Worksheets( 1 ); push @$recID, $Sheet->{Name}; print "Worksheet name is $Sheet->{Name}\n"; my $Tot_Rows = $Sheet->UsedRange->Rows->{'Count'}; my $Tot_Cols = $Sheet->UsedRange->Columns->{'Count'}; my $firstCol = eval { $Sheet->Cells( 1, 1)->{'Value'} }; if( defined $firstCol and $firstCol =~ /^\d{10}$/ ) { push @$right, $firstCol; push @$legal, $firstCol; push @$prod, $firstCol; ValueTargetCols( $Sheet, [ $right, $legal, $prod, ], [ 1 .. $Tot_Rows ], [ qw{ 4 5 } ], ); ValueTargetCols( $Sheet, [ $legal, ], [ 1 .. $Tot_Rows ], [ qw{ 6 } ], ); ValueTargetCols( $Sheet, [ $right, ], [ 1 .. $Tot_Rows ], [ qw{ 7 8 9 } ], ); ValueTargetCols( $Sheet, [ $right, $legal, ], [ 1 .. $Tot_Rows ], [ qw{ 10 11 } ], ); ValueTargetCols( $Sheet, [ $prod, $right, ], [ 1 .. $Tot_Rows ], [ qw{ 12 } ], ); ValueTargetCols( $Sheet, [ $prod ], [ 1 .. $Tot_Rows ], [ qw{ 13 14 15 17 18 20 21 22 23 24 25 26 27 32 33 34 35 36 37 38 39 } ], ); } } $Book->Close; } __END__
      ValueTargetCols should loop over rows first then cols, but it probably makes no difference in the results in this case

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://997762]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (9)
As of 2014-04-18 03:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (461 votes), past polls