Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^15: How likely is rand() to repeat?

by JavaFan (Canon)
on Mar 09, 2012 at 23:33 UTC ( [id://958811]=note: print w/replies, xml ) Need Help??


in reply to Re^14: How likely is rand() to repeat?
in thread How likely is rand() to repeat?

How long will it need to run before you are convinced that the OP is safe to use this?
Did I ever claim otherwise?

All I've been claiming is that if you're seeding a deterministic random number generator with k bits, you will not be getting more than 2k sequences, and that I've seen no evidence whatsoever that you can pull more out of thin air.

After generating 254 sets of 1 million 25-char keys, with each key being generated at a new seed position, NO DUPS SEEN!
254 sets of 1 million keys? Call me confused, but:
for my $run ( 1 .. 1e6 ) { ... for my $id ( 1 .. 1e6 ) { srand ( $run + $id ); ... } }
Aren't you generating a million sets of a million keys each? But with only 2 million different keys, as $run + $id ranges from 2 .. 2e6.

But never mind that. I've never questioned that picking a different seed is very likely to give you a different key. My claim is, that if you have only 232 different seeds you will not get more than 232 keys this way. You exercise showed that if you pick a million different seeds, you get a million different keys. And you repeated that experiment a million times, each time removing a single seed from the set of a million, and adding another one.

Replies are listed 'Best First'.
Re^16: How likely is rand() to repeat?
by BrowserUk (Patriarch) on Mar 09, 2012 at 23:38 UTC

    More to your satisfaction?

    #! perl -slw use strict; use Math::Random::MT qw[ rand srand ]; $|++; my @c = ( 'A'..'Z', 'a'..'z', 0..9 ); for my $run ( 1 .. 1e6 ) { my %seen; keys %seen = 1e6; printf "\rrun: $run"; for my $id ( 1 .. 1e5 ) { srand( $run * $id ); for( 1 .. 10 ) { my $ID = join '', map $c[ int rand( 62 ) ], 1 .. 25; warn "Dup $ID at $run/$id" if exists $seen{ $ID }; undef $seen{ $ID }; } } } __END__ C:\test>"IDrand(62)x25.pl" run: 54

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      More to your satisfaction?
      There's no need to satisfy me.

      I just don't see what point you're trying to make. You've been arguing against my viewpoint that having a seed of k bits cannot lead to more than 2k sequences. I fail to see how your exercise supports your argument, or contradicts mine.

        {Sigh}. All the way back to the beginning. Your original post said:

        Your random number generator is likely to only have 24, 32, 48 or 64 bits of entropy. Which means that most of the possible strings will never be generated. Even with 64 bits of entropy, there are only 18446744073709551616 strings possible.

        That strongly (and wrongly) implies that if your rand() produces 64-bit values and takes a 64-bit seed, that it only has 64-bits of entropy.

        As I demonstrated with my toy 8-bit, the MSVC 31-bit, and MT19937 PRNGs, neither nor both of the former, imply the latter.

        It also stated:

        Which makes is far more likely for collisions to happen.

        But the difference between 0.00000000000000000000000000000000066% & 0.0000000000000000000000000000000066% hardly constitutes "far more likely".

        Then a couple of posts later you said:

        If srand takes a 64 bit number as argument, then there are at most 2^64 sequences of return values of rand() possible.

        Which strongly (and wrongly) implies that the OP could only get 2^64 unique strings using a rand that uses a 64-bit argument to srand.

        As I've demonstrated, that is only true if he only generates 1 string per starting point. Producing two per, doubles it. Etc.

        Your first post also stated:

        In fact, Wikipedia tells us that with a 64 bit hash, there's a one procent chance you have a collision if you have 6.1E8 strings.

        This time, strongly stating, that the OP can expect a collision after generating 610 million keys. Which is wrong in so many ways.

        • The math of a 64-bit hash is completely unrelated to a rand(). Especially one with 640 dwords of state.

          The odds for even a straight 64-bit rand are far higher.

        • By diminution, implies that 1/610 million is somehow high risk.

          Even were it true, it would be "vanishingly small" for an application that only needs < 1 million strings.

        The purpose of this exercise (the code above) was to demonstrate, in a way that anyone could reproduce, that the scare-mongering you've done in this thread has no basis in reality. I hoped that by inviting your opinion/modifications to the demonstration code, to finally end this thread with a tangible conclusion on which we might both agree, rather than the most unsatisfactory one of disparate opinions and bunch of very big/very small numbers.

        Beyond reporting back on the results of running the last version of the demo code, I think that this thread has nowhere left to go.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://958811]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (8)
As of 2024-03-28 11:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found