Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^6: How likely is rand() to repeat?

by BrowserUk (Patriarch)
on Mar 09, 2012 at 05:48 UTC ( [id://958625]=note: print w/replies, xml ) Need Help??


in reply to Re^5: How likely is rand() to repeat?
in thread How likely is rand() to repeat?

Given just four different values for the seed, how can you pick from 24,

I didn't say it could generate all those sequences. Only that from any given starting point, the non-repeating sequence could be any permutation of those 24 permutations.

Sure. But how many different such sequences can it make?

That's the wrong question. When generating the OPs 25-char sequences, you don't re-seed before starting each new sequence. You seed (implicitely) once and then follow that sequence until you have enough.

Therefore the upper bound is the length of the non-repeating sequence (the period) the prng can generate. (4.31e+6001 in the Mersenne Twister).

Of course, that is further constrained because of the modulo operation to bring the generated random values into the 0 .. 61 range. hence 6.45e44.

For the 15-bit RCPRNG built-in to perl on win32, the period (at least when seeded(1), seems (by experiment) to be 214741815.

Which looks suspiciously close to 2^31, but not quite.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re^7: How likely is rand() to repeat?
by JavaFan (Canon) on Mar 09, 2012 at 10:37 UTC
    Only that from any given starting point, the non-repeating sequence could be any permutation of those 24 permutations.
    No. With a seed/state of 2 bits, for any given implementation, and any given starting point, only 4 out of the 24 permutation are possible. Think about it. You're working on a deterministic machine. You have only 4 different begin states. How can you have more than 4 end states?

    If you think I'm wrong, show an algorithm that proves otherwise. Given a 2-bit state, that shouldn't be overly complicated.

    That's the wrong question. When generating the OPs 25-char sequences, you don't re-seed before starting each new sequence. You seed (implicitely) once and then follow that sequence until you have enough.
    Eh, no, it's the right question. As you immediately say after stating "it's the wrong question", a sequence is produced. One that isn't reseeded. So, the question is indeed, "how many different sequences can be produced".
    Therefore the upper bound is the length of the non-repeating sequence (the period) the prng can generate. (4.31e+6001 in the Mersenne Twister).
    No, it's not.

    Here's another generator, with the same period as the Mersenne Twister, in pseudo code:

    use bigint; my $state = 0; sub rand { $state = ($state + 1) % (2 ** 19937 - 1); $state & 0xFFFFFFFF; }
    It's a simple generator, but produces numbers in the range 0 .. 232-1, and has a sequence length of 219937-1 before it repeats itself. It requires an internal state of about 19937 bits, but has no seed (0 bits).

    So, I claim, on each run of the program that uses the above implementation of random, you get one of 20 == 1 different sequences.

    Now, you may have a point if the OP was generating all the passwords he may ever require in his life, in a single run of the program. Then the number of different produced strings depends on the size of the state that the generator keeps. Which, for a typical PRNG is 32, 48 or 64 bits. For MT19937, the internal state is 19968 bits (smallest multiple of 32 greater than 19937). You'd need a seed of that size if you want to carry this information over to different runs of the program.

      If you think I'm wrong, show an algorithm that proves otherwise. Given a 2-bit state, that shouldn't be overly complicated.

      2-bits is clumsy. I hope you'll accept an 8-bit rand algorithm that demonstrates a greater than 256 period?

      #! perl -slw use strict; use Data::Dump qw[ pp ]; { my @x = (0x00011011) x 24; my $x = 0; sub srand8 { $x = $_[0] % 24; } sub rand8{ $x = ++$x % 24; $x[ $x ] = ( $x[ $x ] * 33 + 251 ) & 255; return $x[ $x ]; } } our $L //= 1e4; our $S //= 1; srand8( $S ); my $s = ''; $s .= pack 'C*', map rand8(), 1 .. 256 for 1 .. ($L/256+1); print length $s; $s =~ m[(.{256}).*?(\1)]sm and print "Sequence at [ $-[1], $-[1] ] repeats at [ $-[2], $+[2] +]"; __END__ C:\test>rand8 -S=1 10240 Sequence at [ 0, 0 ] repeats at [ 6144, 6400 ] C:\test>rand8 -S=2 10240 Sequence at [ 0, 0 ] repeats at [ 6144, 6400 ] C:\test>rand8 -S=3 10240 Sequence at [ 0, 0 ] repeats at [ 6144, 6400 ] C:\test>rand8 -S=4 10240 Sequence at [ 0, 0 ] repeats at [ 6144, 6400 ] C:\test>rand8 -S=5 10240 Sequence at [ 0, 0 ] repeats at [ 6144, 6400 ] C:\test>rand8 -S=255 10240 Sequence at [ 0, 0 ] repeats at [ 6144, 6400 ]

      That 6144 period could probably be improved upon with some time spent tweaking the constants, but it is hardly over-complicated.

      Now, you may have a point if the OP was generating all the passwords he may ever require in his life, in a single run of the program.

      Okay. Half way there. :)

      That is what I assumed he was doing. I felt (still feel) that was his intent from reading the OP. But, you might be right that he intends generating them piecemeal. Or on-demand.

      Using the 32-bit MT, as you've said, there are 2**32 starting points. That's 4e9 starting points into a non-repeating sequence of 4e6001.

      Assuming he allows it to self-seed -- no srand() -- even if perchance two of his runs picked adjacent seed-points in the sequence, on average, he'd have to generate 4e6001 / 4e9 = 1e5992 rands before the two sub-sequences would overlap.

      So, (ignoring the birthday paradox, imperfect PRNG etc. for a moment), for him to get a dup, he would have run his program 2**32 times and pick exactly 1 sequence each time. But if he generates 10 each time, that's 10 * 2**32 sequences before he gets a dup.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        I hope you'll accept an 8-bit rand algorithm that demonstrates a greater than 256 period?
        Sure.
        my @x = (0x00011011) x 24;
        But that's not 8-bits. You keep a state using 768 bits. You've no dispute from me that you can create long periods from that. Busy Beavers can go through an amazing number of steps with just very limited memory to keep state on. A trivial counter using a rollover can go through 2768 values before repeating itself.

        However, considering that you are using 8-bits seeding, all you have are 256 different sequences. Regardless how long they are.

        Assuming he allows it to self-seed -- no srand() -- even if perchance two of his runs picked adjacent seed-points in the sequence, on average, he'd have to generate 4e6001 / 4e9 = 1e5992 rands before the two sub-sequences would overlap.
        That I do not understand. There are 232 seeds. Each of them starts a different sequence. You don't get to start at a random point in the sequence. You could of course keep track of where you are in the sequence, but that requires adding ⌈log2P⌉ bits to the seed, where P is the length of the period.
        So, (ignoring the birthday paradox, imperfect PRNG etc. for a moment), for him to get a dup, he would have run his program 2**32 times and pick exactly 1 sequence each time. But if he generates 10 each time, that's 10 * 2**32 sequences before he gets a dup.
        I read this as "the more he generates, the more it takes for a duplication to happen". That seems quite counter intuitive to me, and I'm not sure if that's what you mean.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://958625]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2024-04-16 19:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found