Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^5: Generate a unique ID

by sundialsvc4 (Abbot)
on Nov 17, 2010 at 14:37 UTC ( #871999=note: print w/ replies, xml ) Need Help??


in reply to Re^4: Generate a unique ID
in thread Generate a unique ID

I do not argue that a PRNG has a non-zero chance of repetition.   But I have found, if only through empirical observation, that a reasonable implementation of this strategy is a good pragmatic solution to the collision problem.

In actual implementation, I would write a loop that attempts, say, ten times, to produce a random directory-name and to create a new directory having that name.   (If the loop failed to do so, each time, it would generate a warning message to STDERR, because even one actual collision would be, in my book, quite unexpected.)   The loop would not permit the directory to be used if it already existed (thus pushing the “atomicity problem” off to the file system).

The odds of even one name-collision are extremely small; the odds of ten collisions in a row are almost-infinitely smaller.

And once the program has acquired a temporary directory that is all its own, it can build whatever files it wants within that directory, and can do with them as it pleases.

Upon termination, it destroys the directory and its content.

I would probably add a short prefix to the random string, both to make it easier to recognize why a given directory-name is present in /tmp, and to simplify the process of removing them en masse.


Comment on Re^5: Generate a unique ID
Re^6: Generate a unique ID
by BrowserUk (Pope) on Nov 17, 2010 at 15:02 UTC
    The odds of even one name-collision are extremely small;

    Do you consider a 1 in 30 chance as "extremely small"?

    >perl -E"$h{rand()}++ for 1..1e6; printf qq[prob: %.3f%%\n], (keys(%h) +/1e4)" prob: 3.277% >perl -E"$h{rand()}++ for 1..1e6; printf qq[prob: %.3f%%\n], (keys(%h) +/1e4)" prob: 3.277% >perl -E"$h{rand()}++ for 1..1e6; printf qq[prob: %.3f%%\n], (keys(%h) +/1e4)" prob: 3.277% >perl -E"$h{rand()}++ for 1..1e6; printf qq[prob: %.3f%%\n], (keys(%h) +/1e4)" prob: 3.277%

    What I've implemented is this (the code is to be part of an XS module):

    void makeDir( void ) { in t i = 10; do { sprintf( dir, "c:/tmp/MYAPP04x%04X/", GetCurrentProcessId(), GetTickCount64() & 0xffff ); --i || expire( -99999 ); } while( _mkdir( dir ) == ERROR_FILE_EXISTS ); GetLastError() && expire( - GetLastError() ); return; }

    GetTickCount64() returns the uptime in milliseconds. By truncating it to 16-bits it proves to be a better rand() than MS' CRT rand() :).

    The error codes are provisional!


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      a 1 in 30 chance
      That's just because your PRNG is only 15 bits:

      $ perl -e 'printf "%.3f%", 2**15/1e4' 3.277%
      With a decent PRNG, you'd have a much lower chance of collisions.

        I know :) I did mention that in the post above.

        I also mentioned earlier in the thread, that its use inside a function called truly_random() inside Data::UUID (amongst other things) makes that module suspect.

      Consider the following practical algorithm:

      sub random_string { my $self = shift; my $result = shift; $result = '' unless (defined($result)); my @letters = ('A'..'Z', 'a'..'z'); my @letternum = ('A'..'Z', 'a'..'z', '0'..'9'); # PRODUCES RANDOM ALPHANUMERIC IDENT 8 CHARS LONG, FIRST CHAR ALPHA. $result .= $letters[rand $#letters]; $result .= join "", map { $letternum[rand $#letternum] } 1..7; return $result; }

      One million repetitions later, the same string was never produced twice.   I am quite confident that, if I had ten or even a hundred times as much time to waste waiting for Godot to repeat himself, the result would have been exactly the same.   So, I think that it is quite defensible to say, “it ain’t nevah gonna” happen.   Once you have reduced the probability acceptably close to zero (and of course, have demonstrated in your test-suite that it is, in fact, robust), then ...

      “Well, that’s close enough to zero for peace work ...”

        Hm. With 52 choices for the first character and 62 thereafter, a 4 character is capable of producing 52 * 62**3 different strings:12,393,056

        And yet:

        C:\test>junk 4 57147Dup after 57283 reps at C:\test\junk.pl line 18. 57283 C:\test>junk 4 51003Dup after 51478 reps at C:\test\junk.pl line 18. 51478 C:\test>junk 4 2Dup after 23961 reps at C:\test\junk.pl line 18. 23961 C:\test>junk 4 4Dup after 44858 reps at C:\test\junk.pl line 18. 44858

        A tiny part of that is explainable by your using rand( $#array ) instead of rand( @array ) and therefore never picking the final character in those array, giving 51 * 61**3 := 11576031. But the earlyness of the repeat--as low a 24,000--is way too early to be explained even by the Birthday paradox.

        Minorly tweaked version of your code used above:

        #! perl -slw use strict; sub random_string { my $n = shift; my $result = ''; my @letters = ('A'..'Z', 'a'..'z'); my @letternum = ('A'..'Z', 'a'..'z', '0'..'9'); $result .= $letters[ rand $#letters ]; $result .= join "", map { $letternum[ rand $#letternum ] } 1..$n; return $result; } my %h; printf("\r$_"), $h{ random_string( $ARGV[0] ) }++ and die "Dup after $ +_ reps" for 1 .. 1e6;

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://871999]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (8)
As of 2014-12-21 12:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (104 votes), past polls