Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Running a simulation - expected outcome

by Cristoforo (Deacon)
on Sep 29, 2012 at 19:39 UTC ( #996416=perlquestion: print w/ replies, xml ) Need Help??
Cristoforo has asked for the wisdom of the Perl Monks concerning the following question:

Update: I believe the 2 programs below failed to count the steps. Especially, the first simulation 'hops' for 1 to 6 steps and counts that as 1 step. That is probably wrong and instead should count the cells covered in a hop. This revised program does that.

#!/usr/bin/perl use strict; use warnings; use Storable; use Statistics::Descriptive; my @grid = @{ retrieve('grid.dat') }; my %param = %{ retrieve('param.dat') }; # rows and cols parameters my @contaminated_walks; for (1 .. 100) { my $num_walks = 100; my ($x, $y) = (int(rand $param{range_contx}), int(rand $param{rang +e_conty})); my @walks; for (1 .. $num_walks) { my $steps = 100; my $infected; my $total_steps; #Inner loop to perform each step of a random walk for (1 .. $steps) { $total_steps += my $rand_steps = (1 + int( rand 6 )); last if $total_steps > $steps; my $random_num = rand; if($random_num < 0.25) { for (1 .. $rand_steps) { $x = ($x - 1) % $param{range_contx}; $infected += $grid[$x][$y] || 0; } } elsif ($random_num < 0.5) { for (1 .. $rand_steps) { $x = ($x + 1) % $param{range_contx}; $infected += $grid[$x][$y] || 0; } } elsif ($random_num < 0.75) { for (1 .. $rand_steps) { $y = ($y - 1) % $param{range_conty}; $infected += $grid[$x][$y] || 0; } } else { for (1 .. $rand_steps) { $y = ($y + 1) % $param{range_conty}; $infected += $grid[$x][$y] || 0; } } } push @walks, $infected; } push @contaminated_walks, scalar grep $_, @walks; } my $stat = Statistics::Descriptive::Sparse->new(); $stat->add_data(@contaminated_walks); printf "min-max %d-%d mean: %.1f std. deviation: %.1f count: %d\n", $stat->min, $stat->max, $stat->mean, $stat->standard_deviation, $s +tat->count; __END__ C:\Old_Data\perlp>perl t33.pl min-max 34-61 mean: 50.7 std. deviation: 5.6 count: 100
I would like to see if someone might explain why I'm getting different results from nearly identical, (I'll explain the difference below), simulation runs. The problem was posed on Perl Guru Forums here Creating a 100x100 grid in perl. I am not a student for this problem - just trying to solve the problem for myself.   :-)

The intent of the simulation is to randomly move around a grid a specified number of steps and at the end, see if you stepped upon an infected cell, (and then become infected). The specification was to create a 100 x 100 grid and moving randomly 1 to 6 cells, (up, down, left or right), move around the grid and record any steps upon an infected cell. (the specs had 100 infected out of 10,000) The specs also said to create an unchanging grid and run different simulations with an identical grid. (My script doesn't use an unchanging grid, but I don't think thats the problem here).

Here are the grid creation script,

#!/usr/bin/perl use strict; use warnings; use Storable; my $number_contaminant = 100; my %param = (range_contx => 100, range_conty => 100 ); my @grid; for (1 .. $number_contaminant) { #random positions of the contaminants, put the random number as in +tegrer my $x = int(rand $param{range_contx}); my $y = int(rand $param{range_conty}); redo if $grid[$x][$y]; # if already marked $grid[$x][$y] = 1; } store \@grid, 'grid.dat'; store \%param, 'param.dat';
and the simulation run against the grid,
#!/usr/bin/perl use strict; use warnings; use Storable; use Statistics::Descriptive; my @grid = @{ retrieve('grid.dat') }; my %param = %{ retrieve('param.dat') }; # range_contx and range_conty +parameters my @contaminated_walks; for (1 .. 100) { my $steps = 100; my $num_walks = 100; my ($x, $y) = (int(rand $param{range_contx}), int(rand $param{rang +e_conty})); my @walks; for (1 .. $num_walks) { my $infected; #Inner loop to perform each step of a random walk for (1 .. $steps) { my $random_num = rand; my $steps = (1 + int( rand 6 )); if($random_num < 0.25) { $x = ($x - $steps) % $param{range_contx}; } elsif ($random_num < 0.5) { $x = ($x + $steps) % $param{range_contx}; } elsif ($random_num < 0.75) { $y = ($y - $steps) % $param{range_conty}; } else { $y = ($y + $steps) % $param{range_conty}; } $infected += $grid[$x][$y] || 0; } push @walks, $infected; } push @contaminated_walks, scalar grep $_, @walks; } my $stat = Statistics::Descriptive::Sparse->new(); $stat->add_data(@contaminated_walks); printf "mean: %.f std. deviation: %.f count: %d\n", $stat->mean, $stat->standard_deviation, $stat->count; __END__ C:\Old_Data\perlp>perl his_stat.pl mean: 60 std. deviation: 5 count: 100

And, here is a solution I made that moves, in effect, to any random cell, (unrestrained by the 1 - 6 move in the other program).

#!/usr/bin/perl use strict; use warnings; use List::Util qw/ shuffle /; use Statistics::Descriptive; my @contaminated_walks; for (1 .. 100) { my $contaminated = 100; my $grid_rows = 100; my $grid_cols = 100; my $steps = 100; my $num_walks = 100; my @walks; for (1 .. $num_walks) { my @grid = # 1's and 0's randomly ordered shuffle( (1) x $contaminated, (0) x ($grid_rows * $grid_cols - $contaminated) ); # for each walk, 'grep' gets the count of contaminated cells push @walks, scalar grep $_, map $grid[rand @grid], 1 .. $step +s; } push @contaminated_walks, scalar grep $_, @walks; } my $stat = Statistics::Descriptive::Sparse->new(); $stat->add_data(@contaminated_walks); printf "mean: %.f std. deviation: %.f count: %d\n", $stat->mean, $stat->standard_deviation, $stat->count; __END__ C:\Old_Data\perlp>perl my_stat.pl mean: 64 std. deviation: 5 count: 100

My analysis was based on the likelyhood of stepping upon an uninfected cell (when there were 100 contaminated out of 10,000, (100 x 100)), was 9900/10,000 (or .99 probabilty). Then, for 100 runs, figured the likelyhood of not becoming infected was .99 ** 100, (.99 to the 100th power).

Thus the possibilty of being infected would be 1 - .99**100. (Because, in my simulation script, each step is independent of the one preceding it).

The method to calculate his is slightly different because, he must move at least 1 square and cannot randomly stay in the same cell, (taking 1 of the 9900 uninfected cells out of play for subsequent steps), as my solution can. I think then to get the probability here would be .99 for the first cell to be uninfected and the following cells visited to be .9899, .99 * (.9899 ** 99). Subtracting that from 1 should give the probabilty of infection.

My script's outcome gave .64 chance of being infected while his gave .60 chance. And I don't know why, they are, for practical purposes with the given parameters the same and I would have expected to get the same probabilty of infection for his run, but as seen, it is .04 less.

I guess I wonder why this is happening. Is my calculation in error or is my simulation code wrong? I don't think either is the case, although someone else might point out an error.

Update: I don't know why my readmore tags didn't work for the code portions.

Comment on Running a simulation - expected outcome
Select or Download Code
Re: Running a simulation - expected outcome
by BrowserUk (Pope) on Sep 29, 2012 at 19:54 UTC
    Update: I don't know why my readmore tags didn't work for the code portions.

    They did 'work' -- if you look at Seekers of Perl Wisdom you'll see -- that they just didn't do what you (or I) intuitively want them to do, when looking at the node directly.

    That is the way readmores are designed to work, which is why I find spoilers more useful.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

      Ok, I hadn't considered that as I was looking at the node directly, and not as a node on the Seekers of Perl Wisdom page.
Re: Running a simulation - expected outcome
by BrowserUk (Pope) on Sep 29, 2012 at 20:20 UTC

    The difference is that at any step point, his algorithm can only chose from a maximum (less when near edges) of 84 squares -- a 13 x 13 diamond pattern -- surrounding his current position.

    Your code, by virtue of not tracing a path, but simply picking any random square from the full set, has the full 10,000 choices each time.

    To correctly calculate his odds, you would need to factor in the odds of one or more of those 84 possibilities being contaminated, and then 1:84 chance that he will pick one of them.

    Ie. His odds of contamination are less, because he has less chance of reaching all of the cells. Or rather, any given contaminated cell.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

      Your analysis is based on the probablity of infection. The original analysis is based on the probablility of avoiding infection. It seems to me the result should be the same. I do see a minor "off by one" error in the original because it ignores the fact that he cannot move to the uninfected cell that he is already on.

      Bill

        Okay. But surely, one is just 1 - the other.

        I reckon the answer to be 0.373 / 0.617 on the basis of my own simulation; but I cannot make the math fit.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

      That isn't my reading of his algorithm. It looks to me like it can only choose 1 out of 24 squares - there isn't the possibility of moving diagonally. Also the mod means that the edge effects (should) disappear.

      The difference between the two simulations is as you say, however: he is constrained to the limited path but Cristoforo is not.

        That isn't my reading of his algorithm.

        You are probably right.

        I took cristoforo's description of the algorithm -- which includes this passage "moving randomly 1 to 6 cells" -- as correct. Looking back to the originator's code, I do not know where than came from, but I didn't look very hard now, or at all before posting.

        there isn't the possibility of moving diagonally.

        Hm. It depends how you interpret the phrase: moving randomly 1 to 6 cells, (up, down, left or right),, that not quite the same as "not diagonally". I took to mean that move 5 UP and 1 LEFT was a valid move. Hence I came up with my "13 x 13 diamond pattern" and 1:84:

        ......X...... .....XXX..... ....XXXXX.... ...XXXXXXX... ..XXXXXXXXX.. .XXXXXXXXXXX. XXXXXX.XXXXXX .XXXXXXXXXXX. ..XXXXXXXXX.. ...XXXXXXX... ....XXXXX.... .....XXX..... ......X......

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://996416]
Approved by hesco
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (12)
As of 2014-12-19 20:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (91 votes), past polls