good chemistry is complicated,and a little bit messy -LW PerlMonks

### Sampling From a Histogram Distribution

 on Jan 05, 2014 at 20:49 UTC Need Help??
jabowery has asked for the wisdom of the Perl Monks concerning the following question:

Let's say I've got a probability distribution histogram represented as a hash:

{apples=>4,oranges=>19,pairs=>10,peaches=>5}

What is the best way to simulate random sampling of that histogram? I looked around CPAN for Monte Carlo methods and the only distributions I found supported were math-based such as Math::GSL::Randist -- not data-based.

Replies are listed 'Best First'.
Re: Sampling From a Histogram Distribution
by Kenosis (Priest) on Jan 05, 2014 at 21:32 UTC

Perhaps the following will be helpful:

```use strict;
use warnings;
use List::Util qw/shuffle/;
use Data::Dumper;

my ( @array, %histRand );

my %hist = ( apples => 4, oranges => 19, pairs => 10, peaches => 5 );

push @array, (\$_) x \$hist{\$_} for keys %hist;
@array = shuffle @array;

\$histRand{ \$array[ rand \$#array + 1 ] }++ for 0 .. \$#array;

print Dumper \%histRand;

Output of a run:

```\$VAR1 = {
'oranges' => 18,
'peaches' => 5,
'apples' => 4,
'pairs' => 11
};
Is shuffling essential?

Excellent question. Initially thought about that; likely not, since selection is 'random.' However, left it in to help with 'randomness'--if that's even possible here...

Re: Sampling From a Histogram Distribution
by davido (Archbishop) on Jan 06, 2014 at 03:58 UTC

Another approach: Use Bytes::Random::Secure's string_from method. Supply it with the following "bag" string: "aaaaoooooooooooooooooooppppppppppccccc", and draw as many random characters as you need. The distribution will be appropriately weighted.

Update: Here's how it would look. I'm not convinced that I care for it aesthetically, but the randomness is high quality, and the source is well tested.

```use Bytes::Random::Secure;

my \$rng    = Bytes::Random::Secure->new( NonBlocking => 1 );
my %weight = ( apples => 4, oranges => 19, pairs => 10, peaches => 5 )
+;
my %map    = ( qw/ apples a oranges o pairs p peaches c / );
my %rmap   = reverse %map;

my \$bag = join '', map { ( \$map{\$_} ) x \$weight{\$_} } keys %weight;

print \$rmap{\$rng->string_from(\$bag,1)}, \$_% 8 == 0 ? "\n" : "\t"
for 1 .. 100;

print "\n";

Sample output:

```\$ ./mytest.pl
pairs    oranges    oranges    oranges    pairs    apples    oranges
+  apples
oranges    pairs    oranges    oranges    pairs    peaches    peaches
+   apples
peaches    pairs    oranges    oranges    oranges    pairs    oranges
+   oranges
peaches    peaches    peaches    oranges    oranges    peaches    oran
+ges    oranges
... and so on ...

When that module was created, it was an intentional design decision that duplicate characters in the "bag" string would increase the weighting of those characters.

Dave

Re: Sampling From a Histogram Distribution
by educated_foo (Vicar) on Jan 06, 2014 at 00:53 UTC
Assume you have a uniform random number generator, like Perl's rand. Just scale its output to the range 1..(4+19+10+5), then assign each of your four things a suitably-sized chunk of that range (apples == 1..4, oranges == 5..24, etc.).
Re: Sampling From a Histogram Distribution
by Cristoforo (Curate) on Jan 05, 2014 at 21:10 UTC
Maybe this solution which I think is similar to what you're seeking - from Stackoverflow.

Create A New User
Node Status?
node history
Node Type: perlquestion [id://1069417]
Approved by Kenosis
Front-paged by toolic
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (10)
As of 2017-08-23 12:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
Who is your favorite scientist and why?

Results (350 votes). Check out past polls.

Notices?