http://www.perlmonks.org?node_id=917225

chakreey has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I am trying to generate set of unique random numbers. Here is my script for doing this

use strict; use warnings; my (@RandomSet,$random_number, $element_of_array, $count); foreach(1..10){ $random_number = int(rand(1185)); $count=0; foreach $element_of_array (@RandomSet){ if ($random_number == $element_of_array){ $count++; } } push (@RandomSet,$random_number) if $count==0; } foreach $element_of_array (@RandomSet){ print "$element_of_array\n"; }

for every new random number generated, I am finding out if it is unique or a repeat. Is there a better/faster way to achieve this ?

TIA

Replies are listed 'Best First'.
Re: Is there a better way to generate unique set of random numbers ?
by moritz (Cardinal) on Jul 28, 2011 at 09:44 UTC
    I'd do it like this:
    use strict; use warnings; my @random_set; my %seen; for (1..10) { my $candidate = int rand(1185); redo if $seen{$candidate}++; push @random_set, $candidate; } print join(', ', @random_set), "\n";

    If you need to generate many more than 10 distinct random numbers, this solution will be faster because the hash lookup works in constant time, whereas your solution has to iterate over the whole array for each number. For 10 there won't be a big difference.

    Also note that for testing it is better to reduce the maximum number so that collision become more likely - I tested it with 12 instead of 1185.

    Update: If the maximal number is the same or not much larger than the number of random values you want, this would probably be faster:

    use List::Util qw/shuffle/; my @random_set = (shuffle 0..11)[0..9];

    That is, first generating a random permutation, and then selecting some of the elements.

    But as always it's best to Benchmark the different solutions with real-world data.

      If you need to generate many more than 10 distinct random numbers, this solution will be faster because the hash lookup works in constant time, whereas your solution has to iterate over the whole array for each number. For 10 there won't be a big difference.
      But do note that the only reason your algorithm guarantees to terminate is because rand uses a pseudo random number generator. Were rand to be truly random, neither algorithm could guarantee to be finished before Perl 6 dominates the world.

      Here's an algorithm that does terminate:

      use List::Util 'shuffle'; my @set = (shuffle(0..1184))[0..9];
        Were rand to be truly random, neither algorithm could guarantee to be finished before Perl 6 dominates the world.

        Though if you do the math and calculate the probability that either algorithm does not terminate with the given parameters in, let's say, 2 seconds on a modern machine, you'll probably find that the chance of computation errors in the CPU caused by cosmic rays is much higher.

        Doing the strict theory only makes sense if the actual machine corresponds to the machine model that the theory assumes.

      Thank you

      I am still a beginner in perl, can you help me understand $seen{$candidate}++ in line:

      redo if $seen{$candidate}++;
        %seen is a hash. If the key $candidate is not present in the hash, it is added to the hash (that's called "autovivification"), initialized to 1, and undef is returned - the redo is not executed.

        On the other hand if the key is already present in the hash, the corresponding value (for example 1) is returned, and incremented by one (for example set to 2). In this case the interesting part is not the incrementing, but that it returns a true value, so the redo is executed.

Re: Is there a better way to generate unique set of random numbers ?
by ambrus (Abbot) on Jul 28, 2011 at 10:14 UTC
Re: Is there a better way to generate unique set of random numbers ?
by Anonymous Monk on Jul 28, 2011 at 12:55 UTC
    A PRNG will basically never "exactly repeat," and a sufficiently long string made from one won't either. You could trap key-violations on SQL inserts to catch any cosmic rays. "The way" to detect dupes in a large dataset is to disk-sort it.
Re: Is there a better way to generate unique set of random numbers ?
by FloydATC (Deacon) on Jul 29, 2011 at 21:50 UTC
    I have another suggestion that would seem slightly more linear than stabbing away at a %seen hash:

    my @numbers = (1 .. 10); while (@numbers) { my $index = rand(@numbers); my $pick = splice(@numbers, $index, 1); print "$pick\n"; }
    Edit: The first version had an off-by-one error :o)
    -- Time flies when you don't know what you're doing
Re: Is there a better way to generate unique set of random numbers ?
by pid (Monk) on Jul 29, 2011 at 07:46 UTC

    In case you want a oneliner to solve this, here I paste one that I came up with :)

    $ perl -le '@a{map{int rand 1185}(1..10)}=();print for keys %a' 373 304 286 210 816 777 596 18 766 880