Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Handling HUGE amounts of data

by Dandello (Scribe)
on Feb 01, 2011 at 00:29 UTC ( #885386=note: print w/ replies, xml ) Need Help??


in reply to Handling HUGE amounts of data

Still having 'out of memory issues' even though it finishes its run on my machine.

But it's not finishing on the machine it needs to run on. And since that one's six hours away, there's not much I can do about it except get the code working even better.

Here's the initial data from tmp/datatrans.txt: 1|112|3|21|vertical|1|1296362070|0,400,800,1200,1600,2000,2400,2800,3200,3600,4000,4400,4800,5200,5600,6000,6400,6800,7200,7600,8000,8400,|112,448,448,671,112,336,112,560,112,1119,448,448,783,783,1902,3133,4812,4588,8729,9400,14995,5371,|

UPDATE

I've split the script into two and am using Tie::File on the second part to access the huge chunk of data one line at a time. So far so good. No more 'out of memory' with the data above.

So here's the code as I have it now:

#!/usr/bin/perl #!/usr/bin/perl # $Id: arrayout $ # $Date: 1.27.11 $ # $HeadURL: adamant.net $ # $Revision: 2011 $ # $Source: /arrayout.pl $ ###################################################################### +############ use strict; #use warnings; use CGI::Carp; use lib 'C://testing/lib'; use List::Util qw(sum); use Math::Random::MT qw(rand); #use Math::Random::MT::Auto qw(rand); use Readonly; our $VERSION = 1.20; Readonly my $PCNT => 100; Readonly my $C3 => 3; Readonly my $C4 => 4; Readonly my $C5 => 5; Readonly my $C60 => 60; Readonly my $C10 => 10; Readonly my $NEG => -1; Readonly my $YEAR => 1900; my $filename = 'tmp/datatrans.txt'; open my $DAT, '<', $filename or croak 'cannot open file'; my $dataa = <$DAT>; close $DAT or croak 'cannot close SFILE'; my ( $model, $initial, $copyerr, $LST, $file, $format, $timein, $popyr, $popest, ); #tmp/datatrans.txt sample= 1|8|5|8|newt|2|1296234929|0,10,15,20,25,30, +35,40,45,|5,10,15,10,15,25,20,30,20,|; ( $model, $initial, $copyerr, $LST, $file, $format, $timein, $popyr, $ +popest ) = split /[|]/xsm, $dataa; my @mpe = split /\,/xsm, $popest; my @mpy = split /\,/xsm, $popyr; $#mpy = $LST; $#mpe = $LST; $mpy[0] = 0; # CALCS my @grand = (0); my @popbyyer = ( $mpe[0] ); my @test = (0); for my $y ( 1 .. $LST ) { my $y_diff = ( $mpy[$y] - $mpy[ $y - 1 ] ) || 1; my $e_diff = ( $mpe[$y] - $mpe[ $y - 1 ] ); $popbyyer[$y] = $e_diff / $y_diff; $grand[$y] = $grand[ $y - 1 ] + ( ( $e_diff < 0 ) ? abs($e_diff) : + 0 ); if ( $e_diff > 0 ) { $test[$y] = $e_diff; } else { $test[$y] = 0; } } my $gener = ( sort { $a <=> $b } @mpy )[$NEG]; my $total = $mpe[0] + sum(@test); #build data table; my @dr0 = (); my @dr1 = (); my @dr2 = (); my @dr3 = (); my @dr4 = (); my @dr5 = (); $dr0[0] = 0; $dr3[0] = 0; $dr5[0] = 0; foreach my $tablem ( 1 .. $LST - 1 ) { $dr0[$tablem] = $mpy[$tablem] + 1; $dr3[$tablem] = $mpy[$tablem]; $dr5[$tablem] = $grand[$tablem]; } foreach my $tablen ( 0 .. $LST - 1 ) { $dr1[$tablen] = $mpy[ $tablen + 1 ]; $dr2[$tablen] = $mpe[$tablen]; $dr4[$tablen] = $popbyyer[ $tablen + 1 ]; } popfileb(); #write_to_output(); print qq{DONE!!\n} or croak 'unable to print to screen'; my @aod = map { 'n' x $total; } 1 .. $mpy[$NEG]; #increase - decrease array; sub popfileb { my $countgen0 = 0; foreach my $dlp ( 0 .. $LST - 1 ) { foreach my $yb ( $dr0[$dlp] .. $dr1[$dlp] ) { my $countgen = $countgen0++ - $dr3[$dlp]; my $change = $countgen * $dr4[$dlp]; my $incrs = int( $dr2[$dlp] + $change ); my $incrsdel = int( $dr5[$dlp] + abs $change ); my $chntot = $incrsdel + $incrs; foreach my $xb ( 0 .. $total ) { #set first row; if ( $yb == 0 ) { if ( $xb < $dr2[$dlp] ) { substr $aod[0], $xb, 1, 'a'; } else { substr $aod[0], $xb, 1, 'n'; } } #set increasing rows; elsif ( $dr4[$dlp] >= 0 ) { if ( $xb < $dr5[$dlp] + $incrs ) { if ( substr( $aod[ $yb - 1 ], $xb, 1 ) eq 'd' +) { substr $aod[$yb], $xb, 1, 'd'; } else { substr $aod[$yb], $xb, 1, 'a'; } } else { substr $aod[$yb], $xb, 1, 'n'; } } #set decreasing rows; else { if ( $xb <= $chntot ) { if ( substr( $aod[ $yb - 1 ], $xb, 1 ) eq 'd' +) { substr $aod[$yb], $xb, 1, 'd'; } elsif ( substr( $aod[ $yb - 1 ], $xb, 1 ) eq ' +a' ) { substr $aod[$yb], $xb, 1, 'a'; } else { substr $aod[$yb], $xb, 1, 'n'; } } else { substr $aod[$yb], $xb, 1, 'n'; } } } #set random decreased cell; if ( $dr4[$dlp] < 0 ) { #pre-populate deletion array; my @delarray = ( 0 .. $chntot ); fisher_yates_shuffle( \@delarray ); my $cndiea = 0; foreach my $del ( 0 .. $chntot ) { if ( defined substr( $aod[$yb], $del, 1 ) && substr( $aod[$yb], $del, 1 ) eq 'd' ) { $cndiea++; } } my $cnr = $cndiea; for my $xd ( 0 .. $chntot ) { my $xda = $delarray[$xd]; if ( defined substr( $aod[$yb], $xda, 1 ) && substr( $aod[$yb], $xda, 1 ) eq 'a' && $cnr < $incrsdel ) { substr $aod[$yb], $xda, 1, 'd'; $cnr++; } } } } } open my $DATABASE, '>', 'tmp/test.txt' or croak 'dataout not made. +'; foreach my $y ( 0 .. $gener ) { print {$DATABASE} $aod[$y] or croak 'unable to print'; print qq{Printing line $y of $mpy[$NEG]\n} or croak 'unable to print to screen'; print {$DATABASE} qq{\n} or croak 'unable to print'; } close $DATABASE or croak 'data1 not closed.'; return; } sub fisher_yates_shuffle { my $deck = shift; my $i = @{$deck}; while ( $i-- ) { my $j = int rand( $i + 1 ); @{$deck}[ $i, $j ] = @{$deck}[ $j, $i ]; } return; } exit;

Now, right now it still puts a huge amount of data into memory but I'm experimenting with writing the data into tmp/test.txt one line at a time using Tie::File. So any suggestions will help.

And of course, there's still the issue of why my son's almost new Win 7 gaming machine isn't performing as well on this as my Win32 XP Pro machine.


Comment on Re: Handling HUGE amounts of data
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://885386]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (8)
As of 2014-12-25 11:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (160 votes), past polls