http://www.perlmonks.org?node_id=885198


in reply to Re^2: Handling HUGE amounts of data
in thread Handling HUGE amounts of data

99.99% there - it ran out of memory when I hit the close button on the little Perl/Tk popup that comes up at the end to announce the data run was done.

Converting @aod into a string was a big improvement, but so was finding an array that was hiding in a sub routine. Sometimes you're just too close to see things.

Since I know the final user (my boy child) will want even more data, there's still a little more work to do.

#model 1; sub popnum1 { ( $x, $y, $z ) = @_; if ( $y == 0 ) { $aob[$x][0] = $initial + $z; } else { if ( substr ($aod[ $y-1],$x,1) ne 'a' ) { $aob[$x][$y] = $initial + $z; } else { $aob[$x][$y] = $z + $aob[$x][ $y - 1 ]; } } return $aob[$x][$y]; }

This is one version of the @aob generator. It's called only when the corresponding element in @aod is an 'a' (so it varies from one row to the next. $z is a freshly generated random number (floating point decimal plus or minus) - got rid of another memory eating array in favor of a single variable.

So @aob is the last big array to be tamed. But I'm gaining on it.;)

Replies are listed 'Best First'.
Re^4: Handling HUGE amounts of data
by BrowserUk (Patriarch) on Jan 31, 2011 at 07:04 UTC
    So @aob is the last big array to be tamed.

    Did you try Tie::Array::Packed?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      I'll try to decipher its vagaries when I've had some sleep. But it looks promising.

      I'm just glad the blasted program's working adequately now.

      In tab delimited, the final data file comes in at 382 Meg.

Re^4: Handling HUGE amounts of data
by Dandello (Monk) on Jan 31, 2011 at 19:45 UTC

    Well, it still throws an 'out of memory' when I close the little Perl/Tk that announces the script has finished running.

    I assume I've done this right as BrowserUk suggested using Tie::Array::Packed to save on RAM:

    tie @aob, 'Tie::Array::Packed::DoubleNative'; #model 1; sub popnum1 { ( $x, $y, $z ) = @_; if ( $y == 0 ) { $aob[$x][0] = $initial + $z; $zaza = $aob[$x][0]; } else { if ( substr( $aod[ $y - 1 ], $x, 1 ) ne 'a' ) { $aob[$x][$y] = $initial + $z; $zaza = $aob[$x][$y]; } else { $aob[$x][$y] = $z + $aob[$x][ $y - 1 ]; $zaza = $aob[$x][$y]; } } return $zaza; }

    I figure that returning a single variable ($zaza)is more efficient than returning $aob[$x][$y] - it's hard to tell.

      I figure that returning a single variable ($zaza)is more efficient than returning $aob$x$y

      Returning $aob[$x][$y], is returning a single variable. Whether you derefence the arrays here:

      $zaza = $aob[$x][$y];

      Or here:

      return $aob[$x][$y];

      Makes no difference.

      However, using my for ( $x, $y, $z ) & $zaza would make some difference as lexicals are more efficient than globals. Plus you could then benefit from use strict.

      But your subroutine can be refactored as:

      sub popnum1 { my( $x, $y, $z ) = @_; if ( $y == 0 ) { return $aob[$x][0] = $initial + $z; } else { if ( substr( $aod[ $y - 1 ], $x, 1 ) ne 'a' ) { return $aob[$x][$y] = $initial + $z; } else { return $aob[$x][$y] = $z + $aob[$x][ $y - 1 ]; } } }

      which saves a temporary variable and two, double dereferences.

      Personally, I think I'd code that as:

      sub popnum1 { my( $x, $y, $z ) = @_; return $aob[ $x ][ $y ] = $y && substr( $aod[ $y - 1 ], $x, 1 ) ne 'a' ? $initial + $z : $z + $aob[$x][ $y - 1 ]; }

      Though I'd want to verify that my logic transformation was correct. That should be appreciably more efficient than your original above.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Actually I am using strict - the my declaration for $x, $y, and $z was elsewhere. And this was my original before I got creative:

        sub popnum1 { ( $x, $y, $z ) = @_; if ( $y == 0 ) { $aob[$x][0] = $initial + $z; } else { if ( substr( $aod[ $y - 1 ], $x, 1 ) ne 'a' ) { $aob[$x][$y] = $initial + $z; } else { $aob[$x][$y] = $z + $aob[$x][ $y - 1 ]; } } return $aob[$x][$y]; }

        I've moved the Perl/Tk done snippet into a different script and running the full load still throws an 'out of memory' during what should be cleanup. Now I have to track down exactly where.