Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Weighted random numbers generator

by spurperl (Priest)
on Mar 13, 2003 at 16:40 UTC ( #242744=perlquestion: print w/ replies, xml ) Need Help??
spurperl has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks,

Disclaimers:
* This is more a call-for-code-feedback than a question...
* Please mentally perform s/random/pseudo-random/ on this post

A weighted random generator is needed when events should happen at random, but some should have a higher probability of happening than others ("All animals are equal, but pigs are more equal than the others" ... Orwell). For example, I'd like to decide between 0, 1 and 2 randomly, with 0 and 1 having an equal chance to be chosen, and 2 having a chance that is twice higher.

The following code shows a function implementing this weighted random generator, with some test code. Please tell me what do you think about it... Any obvious slumps ? What about efficiency ?

Thanks in advance
#!/usr/local/bin/perl -w use strict; # Given an array of weights, chooses an array element # pseudo-randomly based on those weights. # # For example, given (1, 1.25, 3.6, 2) # The chance that the 2nd element will be chosen is # 3.6 times as large as the chance that the 0th element # will be chosen # sub choose_weighted { my @weights = @{$_[0]}; my $acc = 0; my @acc_arr; foreach (@weights) { $acc += $_; push(@acc_arr, $acc); } my $rand_val = $acc * rand; my $i = 0; ++$i while ($acc_arr[$i] <= $rand_val); return $i; } # Test code - just to prove that we get reasonable # distributions # # With the test array used below, $count[3] obviously # should be about twice as large as $count[1], etc... # my @ss = (1.75, 2, 3.6, 4); my @count = (0, 0, 0, 0); for (my $i = 0; $i < 500000; ++$i) { ++$count[choose_weighted(\@ss)]; } $, = "\n"; print @count;

Comment on Weighted random numbers generator
Download Code
•Re: Weighted random numbers generator
by merlyn (Sage) on Mar 13, 2003 at 16:56 UTC

      There's probably a CPAN module that does it.

      I have a similar problem and already searched CPAN, but couldn't find anything. Any ideas how such an algorithm might be called?

        I'm not sure the algorithm would have a special name.

        It's just a name to generate "weighted uniform" distribution.
        I hope that the nodes in this thread helped you accomplish what you want - there is the initial solution and many improvements and ideas.
Re: Weighted random numbers generator
by blokhead (Monsignor) on Mar 13, 2003 at 17:05 UTC
    I'm no probabalist, but your code seems on the ball to me. Here's an alternative method I use from time to time that doesn't require the creation of an additional accumulating-weight array:
    sub choose_weighted { my $weights = shift; my $total = 0; $total += $_ for @$weights; my $rand_val = $total * rand; my $i = -1; $rand_val -= $weights->[++$i] while ($rand_val > 0); return $i; }

    1. Pick a random number less than the total
    2. At each position of the array, subtract the weight at that position
    3. When the result is negative, return that position

    I have found this algorithm easier to adapt for multiple selections, i.e., calling choose_weighted( [1.75, 2, 3.6, 4], 2 ) and getting back two (different) numbers randomly chosen with this weighting. However, once you start doing this, you have to modify the array of weights (each time you select an item, set its weight to zero and subtract accordingly from $total), which may be undesired here.

    Cheers,

    blokhead

Re: Weighted random numbers generator
by pg (Canon) on Mar 13, 2003 at 17:22 UTC
    Your way is basically to redistribute the random numbers generated by random(). Seems to me, the problem is that its performance would go down, when the number of sections goes up. It is an ~o(n), and the internal random() is o(1).

    Perl’s algorithm to generate random number is in the same algorithm family with the following: (srand() plays the role to determine the initial seed).
    this_seed = (last_seed * 69069) % 2 ** 32; (equation 1)
    this_random_number = this_seed / 2 ** 32; (equation 2)
    
    You may want to play with it, and see whether you can find a way to distribute the numbers weighted, and at the same time sacrifice less performance.

    Update

    Agree with no_slogan, binary search, if you choose to stick with random().
      There's no way you can possibly do it in better than O(n) time, since you're going to have to look at each weight at least once (otherwise, ((1)x100, 10000) wouldn't be treated properly).
      Seems to me, the problem is that its performance would go down, when the number of sections goes up. It is an ~o(n), and the internal random() is o(1).

      Using a modified binary search, you can get O(log n) performance. You'll need to precompute @acc_arr instead of building it on the fly, of course. See Math::IntervalSearch, though that's not optimized for the random number generation case.

      If the weights aren't too vastly different (for example, [(1) x 1000, 1_000_000]), you can get a O(1) solution by slicing up @acc_arr into equal-sized intervals. Details left as an exercise for the reader. Update: This is roughly what dga is proposing. With care, it's possible to remove the rounding-off from his solution.

Re: Weighted random numbers generator
by dga (Hermit) on Mar 13, 2003 at 17:50 UTC

    If performance is an issue then you may consider caching the code which generates the numbers.

    In this entry I have set up a $funcs scalar to hold some code refs depending on how the function was called ie. the weights.

    The function flattens the elements sent in into a text string then uses that as a key value to save the code ref under. Then it calculates some values which normalize the input probabilities into an array which will have about 100 elements. Each element of the array stores the desired output value. We then build an anonymous function which returns a random output values based on the real number of items in the array. We precompute and store the number into a scalar before building the function so we dont have to get the size of the array every time we make up a random value.

    This function would work for N different weightings used in the same program since it would make up a function to do each distinct weighting and save it off for later use.

    This could be extended into a constructor such that you pass in the relative weights and the constructor returns the code ref directly to the caller so that it is no longer necessary to even pass the weighting parameters on each call. In the sample program given this caching is a huge win since except for the first time we do a code lookup and then reference into a fixed length array by a random subscript each time a number is desired. If the sub passed the code ref out directly then this could be at least twice as fast since the key construction to lookup the code probably takes as long as getting the random number from the function.

    Caveats: If you have large number of outcomes then 100 will not be enough slots. Changing the numerator on the $scale= line will add more slots and use more space accordingly. Also I am taking the weightings by value from @_ where the original poster was passing a reference. Be sure to include the block which encloses the definition of funcs and weighted_rand together to have presistance of your generated functions.

    Have Fun

    { my $funcs; sub weighted_rand { my @weight=@_; my $key=join(',', @weight); return &{$funcs->{$key}} if($funcs->{$key}); # use existing funct +ion my $sum; map { $sum += $_ } @weight; my $scale=100/$sum; my @outcome; my $outcome=0; foreach my $w ( @weight ) { my $cnt=sprintf "%.0f", $w*$scale; # get a rounded number of sl +ots push @outcome, (split '', $outcome x $cnt); $outcome++; } my $outcount=@outcome; # if the rounding gets us 99 or 101 this w +ill adjust the function $funcs->{$key} = sub { $outcome[rand($outcount)] }; return &{$funcs->{$key}}; } }
Re: Weighted random numbers generator
by Zaxo (Archbishop) on Mar 13, 2003 at 18:05 UTC

    You can swap memory for time to do this in O(1). Populate an array with your values in the distribution you want, and select a random index.

    Your example, ( 1, 1.25, 3.6, 2), is ( 20/20, 25/20, 72/20, 40/20), so:

    { my @dist = ( (0) x 20, (1) x 25, (2) x 72, (3) x 40 ); sub choose_weighted { $dist[ rand(@dist)]; } }
    If time is not an issue, your method is fine, except that < is probably better than <= in the comparisons. That is because rand() returns a number less than one.

    After Compline,
    Zaxo

        But it's only set up once. That's why the closure. Overhead is not charged in scaling arguments, and array indexing is certainly O(1) for positional representations - up until you need bigints for the indexes.

        After Compline,
        Zaxo

Re: Weighted random numbers generator
by crenz (Priest) on Mar 13, 2003 at 18:26 UTC

    If I am correct, all solutions presented so far focus on getting one element. I've been looking for an efficient way to do exactly what spurperl is looking for, only for a number of samples. Ie., I want to say

    @indices = DrawSamplesUneven(@elements, 5);

    and get five indices. Any ideas?

    Update: See my previous question Drawing samples from a set (regarding drawing one sample)

    Update: Forgot to mention that of course I want to get five different indices.

      My code above would work nicely since your 5 samples would use the same weighting and so you get the speed win. Build a tiny wrapper to call it 5 times.

      For Example:

      sub lots_o_indexes { my($count)=@_; my @indexlist; foreach (1..$count) { push @indexlist, weighted_rand(@weights); } return @indexlist; }
Re: Weighted random numbers generator
by thraxil (Prior) on Mar 13, 2003 at 18:46 UTC
Re: Weighted random numbers generator
by DrManhattan (Chaplain) on Mar 13, 2003 at 21:07 UTC
    After typing this up, I see Zaxo already posted the same idea. Ah well, here it is anyway.
    #!/usr/bin/perl use strict; # Assign a weight to each item. In this example # pigs get twice the weight of dogs or cows my %weight = ( "dog" => 1, "cow" => 1, "pig" => 2 ); # Create an array with a number of elements # equal to the weights of the items my @bucket; foreach my $animal (keys %weight) { push @bucket, ($animal) x $weight{$animal}; } # @bucket now looks like this: # # $bucket[0] = "dog" # $bucket[1] = "cow" # $bucket[2] = "pig" # $bucket[3] = "pig" # # "pig" has twice as many slots as "dog" or "cow" # The choose_weighted() subroutine now just has to # pick a random element from the array. sub choose_weighted { my $bucket = shift; return $bucket->[rand(@$bucket)]; } # Test code to demonstrate it works my %count; for (1..1000) { my $animal = &choose_weighted(\@bucket); $count{$animal}++; } foreach my $animal (keys %count) { print "$animal: $count{$animal}\n"; }

    -Matt

      I think that this solution is pretty clever in getting the selection down to O(1), however, how practical is it? Suppose I do something like:

      my %weight = ( dog=>100000000, cat=>120000000, pig=>40000000000 );
      Who would do that besides evil people (such as myself >:-))? Fair enough, but what about the example group? The sub is passed a reference to an array containing: (1, 1.25, 3.6, 2). The 1.25 would get only 1 index and 3.6 would only get 3. Of course, you could figure out the least common multiple quite easily, but what about those cases where you have very long decimals? Suppose the weight is determined by a calculation that may involve pi. You need a somewhat precise value for pi so you use 3.14159265358979. If you figure out the lcm, have fun generating an array that large :). If you can round the decimals, this certainly becomes less problematic. But what if there are 1000 indices that are weighted in this fashion? You would still have an array so large that perl would run out of memory. Also, when you need a higher precision, you need a higher precision and your solution becomes impractical.

      I'm not saying that your solution is wrong. I'm merely pointing out that it could become very expensive in terms of memory and that I would probably opt for the binary search solutions others are proposing.

      Updated: Fixed a typo. I need to learn how to spell :-/

      antirice    
      The first rule of Perl club is - use Perl
      The
      ith rule of Perl club is - follow rule i - 1 for i > 1

Re: Weighted random numbers generator
by Limbic~Region (Chancellor) on Mar 13, 2003 at 22:06 UTC
    spurperl,
    I have only briefly read the other replies, but I believe my solution may be unique. If it isn't, then I apologize.

    First let me solve your overly simplistic example, then give you the logic behind a more general solution:

    For example, I'd like to decide between 0, 1 and 2 randomly, with 0 and 1 having an equal chance to be chosen, and 2 having a chance that is twice higher.

    This can be reduced to generating a random number between 1 and 4 and then checking the results.

  • If the number is 1, then choose 0
  • if the number is 2 then choose 1
  • If the number is 3 or 4 then choose 2

    The more general approach to this is to calculate how many elements are required to cover the entire range, and then set up the range logic to check the results.

    I haven't provided any code because it is pretty straight forward, but if you would like to see some code that can generate the weighting dynamically - let me know as it less straight forward (but still not too difficult).

    Hope this helps - L~R

    Update: Just realized this is my 100th post - woo hoo

Re: Weighted random numbers generator
by Anonymous Monk on Mar 13, 2003 at 22:27 UTC

    In english, what you might want to do, is set boundaries. Let's choose difficult numbers to drive the point.

    Let's say you can easily generate a number between 0 and 5. Let's say you want to choose from the set (0,1,2,3) with weights (10,30,12,44).

    First, add up your weights. You'll get your fractions that way. 10/96, 30/96 etc etc.. Now find the distance between your numbers. 0->5 is 5 integers long.

    Now you wanna distribute, as floats, 0-5 to map to your set. You start with 0. 10/96 of 5 is 50/96. So 1 maps to \[0->50/96]. Now add your second boundary. 30/96 of 5 is 150/96. [50/96,150/96] is your second boundary. All I'm doing is finding the factions of the range (weights/whole * range of random numbers).

    Now choose a random number betwe 0 and 5. It'll generate something in between one of those ranges.

    [0,50/96) => 1, [50/96,150/96) => 2, on and on [some fraction,5] => 3
    Divide your range of random numbers into fractions proportioned by the weights.

    Here's a simple example. Say you could generate a number between 0 and 3, and your weights mapped like (1,2) with weights (1,2) respectively...

    a random number between 0 and 3 would map like this
    [0,3/3 * 3) => 1 [3/3 * 3, 3/3*3 + 2/3*3 ) => 2 or... [0,1] => 1, [1,3] => 2
    Find a random number bewtween 0 and 3.. p00f!

    edited: Fri Mar 14 23:34:05 2003 by jeffa - formatting

Re: Weighted random numbers generator
by demerphq (Chancellor) on Mar 13, 2003 at 23:09 UTC

    I think the below is functionally the same as yours:

    #!perl -l my %weights=(a=>1,b=>2,c=>3); my $max=0; my @values=map { my $range=[ $max, $max+$weights{$_}, $_ ]; $max+=$weights{$_}; $range } keys %weights; foreach (1..100) { my $r=rand($max); $_->[0]<=$r and $r<$_->[1] and print $_->[-1] foreach @values; }

    You could binsearch the list instead of linsearching it for a bit of speed up.


    ---
    demerphq

    <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...
Re: Weighted random numbers generator
by Anonymous Monk on Mar 14, 2003 at 05:33 UTC
    Hi, I usually try to establish some kind of function (stepped or non-stepped) on probability. This can be quite an involved process if I'm, for instance, trying for a normal distribution. So (bear with my ideosyncrasies)-
    my @x = @_; my $y = 0; my $i = 0; until ($i > $#x) { $y = $y + $x[$i]; $i = $i + 1; } $x[0] = $x[0] / $y; $i = 1; until ($i > $#x) { $x[$i] = $x[$i - 1] + $x[$i] / $y; $i = $i + 1; } $y = rand(); # This bit constitutes the "function" $i = 0; until ($y < $x[$i]) { $i = $i + 1; } return $i;
      Woops! Should have placed greater focus on the word _establish_. To speed things up, I furnish the details of the function (in this case, $x$i within the "function" bit at the end), instead of re-evaluating it every time I call a subroutine. So, to clarify, say I use two subroutines, choose_weighted and weight_array.
      my $etc = 0; my @x = (1.3, 1.4, 900, .03, 7, $etc, $etc); @x = weight_array(@x); my $random_value = choose_weighted(@x); sub weight array { my @x = @_; my $y = 0; my $i = 0; until ($i > $#x) { $y = $y + $x[$i]; $i = $i + 1; } $x[0] = $x[0] / $y; $i = 1; until ($i > $#x) { $x[$i] = $x[$i - 1] + $x[$i] / $y; $i = $i + 1; } return @x; } sub choose_weighted { my $y = rand(); my $i = 0; until ($y < $x[$i]) { $i = $i + 1; } return $i; }
        Bugger. Spot the derro who forgot to change @x to @_ in choose_weighted.
Re: Weighted random numbers generator
by abell (Chaplain) on Mar 14, 2003 at 09:43 UTC
Re: Weighted random numbers generator
by AssFace (Pilgrim) on Mar 14, 2003 at 13:04 UTC
    There's the code from the Perl Cookbook:

    **UPDATE: here is a node that already mentions it (and the weight_to_dist sub that I didn't put here).
    ########################### #from Perl Cookbook 2.10 #take in a hash and via a weighted random, pick one of the #keys based on the value and return it ########################### sub weighted_rand { my %dist = @_; my ($key, $weight); my $rand; while (1) { # to avoid floating point inaccuracies $rand = rand; while ( ($key, $weight) = each %dist ) { return $key if ($rand -= $weight) < 0; } } } ###########################


    ------------------------------------
    There are some odd things afoot now,
    in the Villa Straylight.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://242744]
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2014-07-28 23:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (210 votes), past polls