Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
Perl-Sensitive Sunglasses
 
PerlMonks  

Yet more While issues

by Dandello (Beadle)
on Mar 16, 2011 at 19:11 UTC ( #893618=perlquestion: print w/ replies, xml ) Need Help??
Dandello has asked for the wisdom of the Perl Monks concerning the following question:

Well, sort of

I'm trying to speed up a subroutine by going from this

sub popnum3 { my ( $x, $y, $z, $zazb ) = @_; if ( $y == 0 ) { $aob[$x][0] = $initial * ( 1 + $z ); } else { while (1) { my $xda = int rand( $total + 1 ); if ( substr( $zazb, $xda, 1 ) eq 'c' ) { $aob[$x][$y] = $aob[$xda][ $y - 1 ] * ( 1 + $z ); last; } } } return $aob[$x][$y]; }
Which gets - for obvious reasons - REALLY slow on large strings, to something like this:
sub popnum3 { my ( $x, $y, $z, $zazb ) = @_; if ( $y == 0 ) { $aob[$x][0] = $initial * ( 1 + $z ); } else { while (1) { my @cPosns; push @cPosns, pos $zazb while $zazb =~ m{(?=c)}g; my $offset = splice @cPosns, rand @cPosns, 1; $aob[$x][$y] = $aob[$offset][ $y - 1 ] * ( 1 + $z ); last; } } return $aob[$x][$y]; }
Which I just can't seem to get my head around to get it to work without throwing 'undefined value' warnings all over the place.

For both these, $x and $y are 2d array references, $z is a predetermined random number and $zazb is a string containing a more or less random number of 'a's, 'c's, 'y's and 'x's (and may be over than 25,000 characters long). This subroutine is supposed to locate one of the 'c's in $zazb, (without iterating over the entire string repeatedly) grab its mate from the @aob array (which is a number) and determine the value of $aob[$x][$y] based on the value of 'c's mate.

This code

my @cPosns; push @cPosns, pos $zazb while $zazb =~ m{(?=c)}g; my $offset = splice @cPosns, rand @cPosns, 1;
is adapted from http://www.perlmonks.org/?node_id=891330

Update

Despite ikegami's input, I still can't get anything even closely related to the above code to work for this instance. However, since the parameters of the model data has changed, it's no longer an issue as all the indexes for 'c' need to saved to a file for data-munging.

So, this does a good and fast job finding the indices for 'c' from @aod (the array of strings containing 'c'.)

tie my @aod, 'Tie::File', 'bias/array.txt', recsep => "\n"; my $letter = 'c'; my $result = 0; open my $DATABASE, '>', $datafileout or croak 'dataout not made.'; foreach my $r (0 .. $gener){ $result = index($aod[$r], $letter); while ($result != -1){ print {$DATABASE} qq{$result,}; my $offset = $result + 1; $result = index($aod[$r], $letter, $offset); } print {$DATABASE} qq{\n} or croak 'unable to print'; } close $DATABASE or croak 'data1 not closed.';
then for the random indexes ($off is the array of 'c' indices.)
my @offst = split /\,/xsm, $off[$y - 1];
then to get the random offset
my $index = int rand ($#offst); $offset = $offst[$index];
It may not be pretty - but it does work.

Comment on Yet more While issues
Select or Download Code
Re: Yet more While issues
by ikegami (Pope) on Mar 16, 2011 at 19:22 UTC

    For clarity more than for speed,

    my $offset = splice @cPosns, rand @cPosns, 1;
    should be
    my $offset = $cPosns[rand @cPosns];
    and
    while (1) { ... last; }
    should simply be
    ...

    For speed, don't copy the zazb argument into a local variable. Access $_[3] directly.

    A possible tweak:

    push @cPosns, pos $zazb while $zazb =~ m{(?=c)}g;

    might be faster as

    push @cPosns, $-[0] while $zazb =~ m{c}g;

    or maybe even

    my $pos = -1; push @cPosns, $pos while ($pos = index($zazb, 'c', $pos+1)) >= 0;

    Benchmark and find out.

    Which I just can't seem to get my head around to get it to work without throwing 'undefined value' warnings all over the place.

    This will occur when 'c' doesn't occur in $zazb. Based on your first snippet, you want

    sub popnum3 { my ( $x, $y, $z ) = @_; # $_[3] is $zazb if ( $y == 0 ) { $aob[$x][0] = $initial * ( 1 + $z ); } else { my @cPosns; push @cPosns, $-[0] while $_[3] =~ m{c}g; if (@cPosns) { my $offset = $cPosns[rand @cPosns]; $aob[$x][$y] = $aob[$offset][ $y - 1 ] * ( 1 + $z ); } } return $aob[$x][$y]; }

      I got it - at least it's stopped throwing warnings and it's a whole lot faster that what I had before.

      Since popnum3 is called from inside an if/else

      else { my $pos = -1; push @cPosns, $pos while ($pos = index($zazb, 'c', $pos+1)) >= 0; my $offset = $cPosns[rand @cPosns]; $cell = sprintf '%.2f', popnum3( $x, $y, $copycop, $offset, ); }
      then
      sub popnum3 { my ( $x, $y, $z, $offset ) = @_; if ( $y == 0 ) { $aob[$x][0] = $initial * ( 1 + $z ); } else { while (1) { $aob[$x][$y] = $aob[$offset][ $y - 1 ] * ( 1 + $z ); last; } } return $aob[$x][$y]; }

      So far it's looking good. Thanks

        What's with the useless while(1) { ...; last; } again?

      Back to square one I guess. This

      sub popnum3 { my ( $x, $y, $z ) = @_; # $_[3] is $zazb if ( $y == 0 ) { $aob[$x][0] = $initial * ( 1 + $z ); } else { my @cPosns; push @cPosns, $-[0] while $_[3] =~ m{c}g; if (@cPosns) { my $offset = $cPosns[rand @cPosns]; $aob[$x][$y] = $aob[$offset][ $y - 1 ] * ( 1 + $z ); } } return $aob[$x][$y]; }
      only seems to work relatively short strings, but not when the string gets much more than 500 characters. Then it throws 'uninitialized value in $offset' errors which I'm guessing means the algorithm didn't find the one 'c' that had to be there.

      I admit I'm at my wit's end here. This looks like it should work, but it doesn't. There is a 'c' in every row, so that's not the problem.

      THIS works

      sub popnum3 { my ( $x, $y, $z, $zazb ) = @_; if ( $y == 0 ) { $aob[$x][0] = $initial * ( 1 + $z ); } else { while (1) { my $xda = int rand( $total + 1 ); if ( substr( $zazb, $xda, 1 ) eq 'c' ) { $aob[$x][$y] = $aob[$xda][ $y - 1 ] * ( 1 + $z ); last; } } } return $aob[$x][$y]; }
      But takes about 5 seconds per row on the full array and since there's 8400 rows in the full run, that's about 12 hours.

      Getting the run time down would be very helpful.

      Thanks

        Then it throws 'uninitialized value in $offset' errors

        I don't believe you when you say that snippet produced that warning.

        Then it throws 'uninitialized value in $offset' errors which I'm guessing means the algorithm didn't find the one 'c' that had to be there.

        No. If it doesn't find any 'c', it never uses $offset.

        PS - Don't count out half the warning (not even counting the line number).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://893618]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (8)
As of 2014-04-18 04:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (461 votes), past polls