http://www.perlmonks.org?node_id=158689

in reply to Rolling a biased die

While IO's solution is really cool, and really simple, it's run time is O(n) (where n = the number of distinct die values). More specificly it is EXACTLY n (i don't remember what that's called ... Theta? Omega?) -- it can never do better, because it allways iterates over all the possible die values. If you are rolling the same die over and over and over that may not be good enough for you (you asked for O(1) ... let's see if we can get there.)

abell's approach is a lot closer to a true O(1) solution. It requires some initialization cost and more space to build a lookup table -- but depending on the how you rank memory/speed it may be worth it. (especially if all of your frequencies are 'relatively' close, requiring you to have 'relatively' few buckets for each value after you've multipled the frequencies by the neccessary factor to get them above 1. The problem is: It's still not truely O(1). For a lot of distrobutions, it will be really close, but you still have to pay an extra cost to resolve th conflicts on all the transition points -- it may seem like that would be inconsequential to the run time, but it really depends on what is important to you. (Example: If you roll the die m times, and your frequencies are such that 1 roll out every k requires a transition point, and picking the riht number at a transtion requires t time, then the run time is O(1 + t(m/k)) ... even if t is 1 (constant time) that's breaks down to O(m/k) which is probably worse then O(n))

(UPDATE: abell made me see the light, his way is O(1) ... and requires a lot less space then the one below, even if the frequencies require that (almost) every bucket be a transition point.)

The only approach I know of that's truely O(1) for every lookup requires that all of the frequency values be integers, such that you can build an array (similiar to the one abell suggested) where each die value has the same number of buckets as it's frequency, and you just pick a random integer value less then the size of the array to use as an index for looking up your die value. This will allways be a true O(1) lookup time cost, but the memory costs can be seriously prohibitive. In MeowChow's orriginal six sided example, converting the frequencies to interger values requires an array of 1322822219 (31 + 20234 + 17 + 1542232 + 1321249563 + 10142) buckets. This appraoch is hardless worth using unless the frequencies are all small integer values and n is "big but not too big" (ie: n ~ 100, and most of the frequencies are 1 with a few 2s 3s, and other small integers)

All in all, IO's solution is the best when you have no idea what n will be, or what the distrobution of frequencies will be. BUT! ... If you know at coding time what n & the frequencies are, and n is relatively small, you can probably get the fastest possible lookup time by hardcodding in a 'switch statement'. If the frequencies are heavily skewed towards a few values, this should definitely give you a performance boost for the "common roll".