Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Random Picks

by knobunc (Pilgrim)
on May 09, 2001 at 18:00 UTC ( #79084=note: print w/ replies, xml ) Need Help??


in reply to Random Picks

[Edited to explain the theory more completely.]

Lifted from the Perl Cookbook recipe 8.6 'Picking a Ranom Line from a File' generalized out to N lines rather than 1. Most deep thought comes from Messrs. Christiansen and Torkington. All errors come from me.

# Call srand if you are on an old perl (< 5.004) my $statement = 'SELECT ssn FROM ...'; my $sth = $dbh->prepare($statement); $sth->execute(); my $target_lines = 1000; my @result = (); my $item_num = 1; while (my ($ssn) = $sth->fetchrow_array()) { if (@result < $target_lines) { # Just add it if we have not hit the limit push @result, $ssn; } elsif (rand($item_num) < $target_lines) { # Otherwise pick something to replace if the # the odds are right $result[rand($target_lines)] = $ssn; } } continue { $count++; }

You can look in the Cookbook for the full description of why this works for a single target_line. The short form is that it will first fill the list with $target_lines elements, then each subsequent line will have a slightly lessened chance of replacing a predecessor. Basically every element in the array should have the same probability of being there at a given point in time. So until the 1001st element each item has a 100% chance of being there. Then the 1001st element has a 1001/1000 chance of replacing something, then there is a 1/1000 chance for each element. Adding that all up it means each element will have a 1001/1000 chance of staying. Then for the 1002nd element, it has a 1002/1000 chance of replacing an element. Again that all adds up to 1002/1000 (1/1000 + 1001/1000) for each element.

The standard caveats about the randomness of the random numbers returned from random apply.

This solution is good if you don't want to slurp all elements in from the DB and if you have to use standard SQL (i.e. no limit clause).

-ben


Comment on Re: Random Picks
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://79084]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (13)
As of 2014-09-23 17:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (234 votes), past polls