Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Sampling From a Histogram Distribution

by jabowery (Beadle)
on Jan 05, 2014 at 20:49 UTC ( #1069417=perlquestion: print w/replies, xml ) Need Help??
jabowery has asked for the wisdom of the Perl Monks concerning the following question:

Let's say I've got a probability distribution histogram represented as a hash:


What is the best way to simulate random sampling of that histogram? I looked around CPAN for Monte Carlo methods and the only distributions I found supported were math-based such as Math::GSL::Randist -- not data-based.

Replies are listed 'Best First'.
Re: Sampling From a Histogram Distribution
by Kenosis (Priest) on Jan 05, 2014 at 21:32 UTC

    Perhaps the following will be helpful:

    use strict; use warnings; use List::Util qw/shuffle/; use Data::Dumper; my ( @array, %histRand ); my %hist = ( apples => 4, oranges => 19, pairs => 10, peaches => 5 ); push @array, ($_) x $hist{$_} for keys %hist; @array = shuffle @array; $histRand{ $array[ rand $#array + 1 ] }++ for 0 .. $#array; print Dumper \%histRand;

    Output of a run:

    $VAR1 = { 'oranges' => 18, 'peaches' => 5, 'apples' => 4, 'pairs' => 11 };
      Is shuffling essential?

        Excellent question. Initially thought about that; likely not, since selection is 'random.' However, left it in to help with 'randomness'--if that's even possible here...

Re: Sampling From a Histogram Distribution
by davido (Cardinal) on Jan 06, 2014 at 03:58 UTC

    Another approach: Use Bytes::Random::Secure's string_from method. Supply it with the following "bag" string: "aaaaoooooooooooooooooooppppppppppccccc", and draw as many random characters as you need. The distribution will be appropriately weighted.

    Update: Here's how it would look. I'm not convinced that I care for it aesthetically, but the randomness is high quality, and the source is well tested.

    use Bytes::Random::Secure; my $rng = Bytes::Random::Secure->new( NonBlocking => 1 ); my %weight = ( apples => 4, oranges => 19, pairs => 10, peaches => 5 ) +; my %map = ( qw/ apples a oranges o pairs p peaches c / ); my %rmap = reverse %map; my $bag = join '', map { ( $map{$_} ) x $weight{$_} } keys %weight; print $rmap{$rng->string_from($bag,1)}, $_% 8 == 0 ? "\n" : "\t" for 1 .. 100; print "\n";

    Sample output:

    $ ./ pairs oranges oranges oranges pairs apples oranges + apples oranges pairs oranges oranges pairs peaches peaches + apples peaches pairs oranges oranges oranges pairs oranges + oranges peaches peaches peaches oranges oranges peaches oran +ges oranges ... and so on ...

    When that module was created, it was an intentional design decision that duplicate characters in the "bag" string would increase the weighting of those characters.


Re: Sampling From a Histogram Distribution
by educated_foo (Vicar) on Jan 06, 2014 at 00:53 UTC
    Assume you have a uniform random number generator, like Perl's rand. Just scale its output to the range 1..(4+19+10+5), then assign each of your four things a suitably-sized chunk of that range (apples == 1..4, oranges == 5..24, etc.).
Re: Sampling From a Histogram Distribution
by Cristoforo (Curate) on Jan 05, 2014 at 21:10 UTC
    Maybe this solution which I think is similar to what you're seeking - from Stackoverflow.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1069417]
Approved by Kenosis
Front-paged by toolic
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (9)
As of 2019-02-18 19:05 GMT
Find Nodes?
    Voting Booth?
    I use postfix dereferencing ...

    Results (100 votes). Check out past polls.