Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Re: Random Picks

by knobunc (Pilgrim)
on May 09, 2001 at 18:00 UTC ( #79084=note: print w/replies, xml ) Need Help??

in reply to Random Picks

[Edited to explain the theory more completely.]

Lifted from the Perl Cookbook recipe 8.6 'Picking a Ranom Line from a File' generalized out to N lines rather than 1. Most deep thought comes from Messrs. Christiansen and Torkington. All errors come from me.

# Call srand if you are on an old perl (< 5.004) my $statement = 'SELECT ssn FROM ...'; my $sth = $dbh->prepare($statement); $sth->execute(); my $target_lines = 1000; my @result = (); my $item_num = 1; while (my ($ssn) = $sth->fetchrow_array()) { if (@result < $target_lines) { # Just add it if we have not hit the limit push @result, $ssn; } elsif (rand($item_num) < $target_lines) { # Otherwise pick something to replace if the # the odds are right $result[rand($target_lines)] = $ssn; } } continue { $count++; }

You can look in the Cookbook for the full description of why this works for a single target_line. The short form is that it will first fill the list with $target_lines elements, then each subsequent line will have a slightly lessened chance of replacing a predecessor. Basically every element in the array should have the same probability of being there at a given point in time. So until the 1001st element each item has a 100% chance of being there. Then the 1001st element has a 1001/1000 chance of replacing something, then there is a 1/1000 chance for each element. Adding that all up it means each element will have a 1001/1000 chance of staying. Then for the 1002nd element, it has a 1002/1000 chance of replacing an element. Again that all adds up to 1002/1000 (1/1000 + 1001/1000) for each element.

The standard caveats about the randomness of the random numbers returned from random apply.

This solution is good if you don't want to slurp all elements in from the DB and if you have to use standard SQL (i.e. no limit clause).


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://79084]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2017-12-17 04:42 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (462 votes). Check out past polls.