http://www.perlmonks.org?node_id=996416

Cristoforo has asked for the wisdom of the Perl Monks concerning the following question:

Update: I believe the 2 programs below failed to count the steps. Especially, the first simulation 'hops' for 1 to 6 steps and counts that as 1 step. That is probably wrong and instead should count the cells covered in a hop. This revised program does that.

#!/usr/bin/perl use strict; use warnings; use Storable; use Statistics::Descriptive; my @grid = @{ retrieve('grid.dat') }; my %param = %{ retrieve('param.dat') }; # rows and cols parameters my @contaminated_walks; for (1 .. 100) { my $num_walks = 100; my ($x, $y) = (int(rand $param{range_contx}), int(rand $param{rang +e_conty})); my @walks; for (1 .. $num_walks) { my $steps = 100; my $infected; my $total_steps; #Inner loop to perform each step of a random walk for (1 .. $steps) { $total_steps += my $rand_steps = (1 + int( rand 6 )); last if $total_steps > $steps; my $random_num = rand; if($random_num < 0.25) { for (1 .. $rand_steps) { $x = ($x - 1) % $param{range_contx}; $infected += $grid[$x][$y] || 0; } } elsif ($random_num < 0.5) { for (1 .. $rand_steps) { $x = ($x + 1) % $param{range_contx}; $infected += $grid[$x][$y] || 0; } } elsif ($random_num < 0.75) { for (1 .. $rand_steps) { $y = ($y - 1) % $param{range_conty}; $infected += $grid[$x][$y] || 0; } } else { for (1 .. $rand_steps) { $y = ($y + 1) % $param{range_conty}; $infected += $grid[$x][$y] || 0; } } } push @walks, $infected; } push @contaminated_walks, scalar grep $_, @walks; } my $stat = Statistics::Descriptive::Sparse->new(); $stat->add_data(@contaminated_walks); printf "min-max %d-%d mean: %.1f std. deviation: %.1f count: %d\n", $stat->min, $stat->max, $stat->mean, $stat->standard_deviation, $s +tat->count; __END__ C:\Old_Data\perlp>perl t33.pl min-max 34-61 mean: 50.7 std. deviation: 5.6 count: 100
I would like to see if someone might explain why I'm getting different results from nearly identical, (I'll explain the difference below), simulation runs. The problem was posed on Perl Guru Forums here Creating a 100x100 grid in perl. I am not a student for this problem - just trying to solve the problem for myself.   :-)

The intent of the simulation is to randomly move around a grid a specified number of steps and at the end, see if you stepped upon an infected cell, (and then become infected). The specification was to create a 100 x 100 grid and moving randomly 1 to 6 cells, (up, down, left or right), move around the grid and record any steps upon an infected cell. (the specs had 100 infected out of 10,000) The specs also said to create an unchanging grid and run different simulations with an identical grid. (My script doesn't use an unchanging grid, but I don't think thats the problem here).

Here are the grid creation script,

#!/usr/bin/perl use strict; use warnings; use Storable; my $number_contaminant = 100; my %param = (range_contx => 100, range_conty => 100 ); my @grid; for (1 .. $number_contaminant) { #random positions of the contaminants, put the random number as in +tegrer my $x = int(rand $param{range_contx}); my $y = int(rand $param{range_conty}); redo if $grid[$x][$y]; # if already marked $grid[$x][$y] = 1; } store \@grid, 'grid.dat'; store \%param, 'param.dat';
and the simulation run against the grid,
#!/usr/bin/perl use strict; use warnings; use Storable; use Statistics::Descriptive; my @grid = @{ retrieve('grid.dat') }; my %param = %{ retrieve('param.dat') }; # range_contx and range_conty +parameters my @contaminated_walks; for (1 .. 100) { my $steps = 100; my $num_walks = 100; my ($x, $y) = (int(rand $param{range_contx}), int(rand $param{rang +e_conty})); my @walks; for (1 .. $num_walks) { my $infected; #Inner loop to perform each step of a random walk for (1 .. $steps) { my $random_num = rand; my $steps = (1 + int( rand 6 )); if($random_num < 0.25) { $x = ($x - $steps) % $param{range_contx}; } elsif ($random_num < 0.5) { $x = ($x + $steps) % $param{range_contx}; } elsif ($random_num < 0.75) { $y = ($y - $steps) % $param{range_conty}; } else { $y = ($y + $steps) % $param{range_conty}; } $infected += $grid[$x][$y] || 0; } push @walks, $infected; } push @contaminated_walks, scalar grep $_, @walks; } my $stat = Statistics::Descriptive::Sparse->new(); $stat->add_data(@contaminated_walks); printf "mean: %.f std. deviation: %.f count: %d\n", $stat->mean, $stat->standard_deviation, $stat->count; __END__ C:\Old_Data\perlp>perl his_stat.pl mean: 60 std. deviation: 5 count: 100

And, here is a solution I made that moves, in effect, to any random cell, (unrestrained by the 1 - 6 move in the other program).

#!/usr/bin/perl use strict; use warnings; use List::Util qw/ shuffle /; use Statistics::Descriptive; my @contaminated_walks; for (1 .. 100) { my $contaminated = 100; my $grid_rows = 100; my $grid_cols = 100; my $steps = 100; my $num_walks = 100; my @walks; for (1 .. $num_walks) { my @grid = # 1's and 0's randomly ordered shuffle( (1) x $contaminated, (0) x ($grid_rows * $grid_cols - $contaminated) ); # for each walk, 'grep' gets the count of contaminated cells push @walks, scalar grep $_, map $grid[rand @grid], 1 .. $step +s; } push @contaminated_walks, scalar grep $_, @walks; } my $stat = Statistics::Descriptive::Sparse->new(); $stat->add_data(@contaminated_walks); printf "mean: %.f std. deviation: %.f count: %d\n", $stat->mean, $stat->standard_deviation, $stat->count; __END__ C:\Old_Data\perlp>perl my_stat.pl mean: 64 std. deviation: 5 count: 100

My analysis was based on the likelyhood of stepping upon an uninfected cell (when there were 100 contaminated out of 10,000, (100 x 100)), was 9900/10,000 (or .99 probabilty). Then, for 100 runs, figured the likelyhood of not becoming infected was .99 ** 100, (.99 to the 100th power).

Thus the possibilty of being infected would be 1 - .99**100. (Because, in my simulation script, each step is independent of the one preceding it).

The method to calculate his is slightly different because, he must move at least 1 square and cannot randomly stay in the same cell, (taking 1 of the 9900 uninfected cells out of play for subsequent steps), as my solution can. I think then to get the probability here would be .99 for the first cell to be uninfected and the following cells visited to be .9899, .99 * (.9899 ** 99). Subtracting that from 1 should give the probabilty of infection.

My script's outcome gave .64 chance of being infected while his gave .60 chance. And I don't know why, they are, for practical purposes with the given parameters the same and I would have expected to get the same probabilty of infection for his run, but as seen, it is .04 less.

I guess I wonder why this is happening. Is my calculation in error or is my simulation code wrong? I don't think either is the case, although someone else might point out an error.

Update: I don't know why my readmore tags didn't work for the code portions.