Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Caveat I am not a mathematician but this excited me!

I also was thinking of normalizing all n weights and partioning the number line into n contiguous segments from 0 to 1, and then of course limiting decimal precision of the weights would allow the array-based algorithm mentioned above.

I don't know how intensive random number generation is (maybe use a preprepared list..) but I got to thinking about defining a set of geometric regions, each the area of which is one of your weights. This was borne out by my looking at documentation for the recently updated GNU Scientific Library.

You can pick random numbers and ignore those which fall outside the region. For example you can choose a random unit vector by picking random numbers until you get two which fall inside a unit circle. It would seem you could do the same thing by either partitioning the circle into regions with an appropriate area, for example into pie slices or maybe swiss-cheese holes, which would let you provide a couple preset weights and a default "otherwise" case which is the cheese. Of course then you get into circle packing problems.. My wierder visualization was to start with a unit circle, or regular polygon, and add triangles around the circumference for the floating point. But obviously you could just use a circle and pick good angles to cut the pizza pie. (Sorry my pies are pizza pies..)

Likewise you can divide a unit square into areas that match your weights.

Then thought, perhaps the weights you desire fit the mathematical curve of a known distribution, which might already exist as C code. So..

The recently updated GNU Scientific Library has a huge list of wonderfully named distributions (wish I had enough math or time to learn math to know them) which might be useful for you. But check out General Discrete Distributions on preprocessing the probability list into a lookup table.

First they discuss making a cumulative probability table where C[0] = 0; C[k+1] = C[k]+P[k]; C[k] = 1 which requires log K steps per random number generation.

Then a very clever method by Marsaglia (1963) is mentioned briefly which for large K (K being the number of discrete events with different probabilities) required a huge lookup table. They finally recommend an efficient procedure devised in 1977 by Alistair J. Walker and covered by Knuth. Since I can't improve on their explanation, here it is:

This requires two lookup tables, one floating point and one integer, but both only of size K. After preprocessing, the random numbers are generated in O(1) time, even for large K. The preprocessing suggested by Walker requires O(K^2) effort, but that is not actually necessary, and the implementation provided here only takes O(K) effort. In general, more preprocessing leads to faster generation of the individual random numbers, but a diminishing return is reached pretty early. Knuth points out that the optimal preprocessing is combinatorially difficult for large K.

It would be really neat if you could provide us with a Perl interface (maybe exists?) to the relevant GSL functions, which would seem to be the four gsl_ran_discrete_xxx() functions. It takes an arbitrary length array of positive numbers of double precision and you can just hand it your weights and it normalizes for you. You build a table first which then is used to generate random numbers which exhibit the weighted probability you described. Math::Gsl exists, should be able to access through that.. though some things may not be implemented it seems to say.

Or maybe just snip the relevant code and use Inline::C? Would like to see what that looks like.

Incidentally the GSL section on Shuffling and Sampling would seem to be quite useful to Perl people.

Maybe math people will be interested in an interesting-looking talk Donald Knuth is going to give at Stanford U. April 23 on a very efficient method of topological sorting. He is providing a preview as 64 pages of gzipped postscript which I was able to view on a windows computer with an online viewer, cached (I hope) here. Knuth says this method has a big impact on computer science. Perhaps a math type would like to mention a couple salient points in it for the rest of us?

There are other packages out there especially R, listed on the Goose page

Cheers,
Matt

Note: The CPAN link above may work for you, I can't see it since CPAN has blackholed this cafe. Would like to hear from anyone who has used Perl bindings to the GSL.


In reply to Re: Rolling a biased die by mattr
in thread Rolling a biased die by MeowChow

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2024-04-23 17:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found