http://www.perlmonks.org?node_id=434533


in reply to Re: Efficient Assignment of Many People To Many Locations?
in thread Efficient Assignment of Many People To Many Locations?

To paraphrase my comment in a recent node, Re: Student Class Section Lottery, if you generate about 30 random solutions and score them with some kind of criteria, the top scorer has a high likelihood of being close to the optimal solution.

(I'm happy to share this as one of the most useful/interesting things I learned in all my college days, from an Operations Research class some 20+ years ago. :-)

  • Comment on Re^2: Efficient Assignment of Many People To Many Locations?

Replies are listed 'Best First'.
Re^3: Efficient Assignment of Many People To Many Locations?
by CountZero (Bishop) on Feb 25, 2005 at 22:40 UTC
    That is a very interesting comment. If you can live with "good enough" rather than "maximum" it is indeed worth trying.

    Is the number "30" somehow related to the number of elements in the set of all possibilities or is it a value which is valid over a large spread of sets?

    Perhaps someone more versed in statistics than me can look into the following: given a problem which has 1,000,000 solutions, generate 30 random numbers between 1 and 1,000,000. If the score of the solution is equal to the value chosen, what are the chances that one of these random values is within 10% of the maximum value (1,000,000)?

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      The cool part is that 30 is valid over an arbitrarily large spread of sets. Yes, we need a statistician to show this (or I have to hunt down some 20-year-old notebooks), and of course you could generate more than 30 solutions, but there's a knee-of-a-curve thing about "30": basically it's hard to draw 30 "solutions" from a pot full of randomly arranged solutions and not have "one" come from each of the extreme edges of possible solutions, i.e. from better than 90th percentile and worse than 10th. That's what makes this the Law of Large Numbers--it is irrelevant how many possible solutions there really are.

      There is a 10% chance at any iteration of choosing a value in the top 10% (duh) when you are just selecting linearly numbers on a number line. So, the probability of choosing a number in the top 10% over 30 iterations is simply:

      printf("%.0f%%",100*(1-.9**30));

      I don't see the magic of the number 30 here-- the cost/benefit of iterations to improved results appears to be linear to me. You get results that are twice as good by doing twice as many iterations.

      It's easy to conceive of a non-linear results set where you are likely to come nowhere near the maximum value by picking some smallish number of random iterations.

      Try, for instance, finding the maximum value of 1/10**N where N is an integer between 0 and 1,000,000. What is the percent error between your randomly selected max over thirty iterations and the actual max value? My money says it is 100%. Of course, evaluating the linear variance of an exponential function may be a bit of a stretch...

        Concerning the number 30, you are correct, there's nothing particularly magical about it except that it's not too difficult to remember and it gives pretty good results for this general problem, i.e. if you want to be 95.8% sure that one of your randomly generated solutions is in the top 10% of all solutions, run 30 trials.

        But to play with your equation, I submit the following:

        use strict; use warnings; # We'd like to see the likelihood that by running a given number # of trials that one of these trials gives a result in some top X% # of all the possible results. # # As an example, suppose we need to generate a solution in the top # 10% of all possible solutions, i.e. we want to exceed a "clip # level" of 0.90. How many trials would we need to run to have an # X% chance that one of those solutions out of all those trials is # in that top 10%? # # Generating a simple little table shows us the percentage chance # that one of those trials will be in that bracket. For a 95.8% # chance that one of our trials is in the top 10%, our little # table shows us that we'd need to run 30 trials. # # Running 31 trials would boost our likelihood of being in that # top 10% by about 0.4%. # # Changing the clip level to 0.95 and upping our upper bound to # 65 shows that running 62 trials would give us that 95.8% # confidence of one of our trials being in that set of solutions # in the 95-100% range of best solutions. By running only 30 # trials, we'd only be 78.5% sure that one of our solutions is # superior to 95% of all possible solutions. my $want_soln_above_this_clip_level = 0.90; my $min_no_trials = 15; my $max_no_trials = 35; my $value; my $prev_value = 0; foreach ( $min_no_trials .. $max_no_trials ) { print "$_\t"; #printf("%.1f%%",100*(1-.9**$_)); $value = 100*(1-$want_soln_above_this_clip_level**$_); printf("%.1f%%", $value); print "\t"; printf("%.1f%%", $value - $prev_value); print "\n"; $prev_value = $value; } __DATA__ 15 79.4% 79.4% 16 81.5% 2.1% 17 83.3% 1.9% 18 85.0% 1.7% 19 86.5% 1.5% 20 87.8% 1.4% 21 89.1% 1.2% 22 90.2% 1.1% 23 91.1% 1.0% 24 92.0% 0.9% 25 92.8% 0.8% 26 93.5% 0.7% 27 94.2% 0.6% 28 94.8% 0.6% 29 95.3% 0.5% 30 95.8% 0.5% 31 96.2% 0.4% 32 96.6% 0.4% 33 96.9% 0.3% 34 97.2% 0.3% 35 97.5% 0.3%