Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^4: Generate a unique ID

by BrowserUk (Pope)
on Nov 15, 2010 at 20:19 UTC ( #871570=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Generate a unique ID
in thread Generate a unique ID

You will have won the Lottery in every state and every country,

Most lotteries are won!

The problem with a random number solution, is the quality of the random number generator:

>perl -E"++$n, $h{rand()}++ and die qq[Repeat after $n iterations] for + 1 .. 1e6" Repeat after 110 iterations at -e line 1. >perl -E"++$n, $h{rand()}++ and die qq[Repeat after $n iterations] for + 1 .. 1e6" Repeat after 225 iterations at -e line 1. >perl -E"++$n, $h{rand()}++ and die qq[Repeat after $n iterations] for + 1 .. 1e6" Repeat after 115 iterations at -e line 1. >perl -E"++$n, $h{rand()}++ and die qq[Repeat after $n iterations] for + 1 .. 1e6" Repeat after 28 iterations at -e line 1.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re^4: Generate a unique ID
Download Code
Re^5: Generate a unique ID
by afoken (Parson) on Nov 15, 2010 at 20:48 UTC

    So, if you can't use (or don't trust) the random number generator, re-implement what an SQL sequence does. As long as your job is restricted to a single machine, and your OS supports at least advisory locks, that should not be too hard. This is very similar to a robust web page visitor counter script (one that does not damage the counter when called in parallel).

    You need a file that contains the current sequence number, all access to that file is protected by locks, so that at any time exactly one process can read and increment the sequence number. The thread Trying to understand flock contains some tips.

    If you have to work with different machines and networked filesystems (NFS, CIFS, AFS, ...), don't bet on working locks. Implement the sequence number generator as a dumb TCP/IP server on a high port (>1024), that can handle only one client. Use a (properly locked) counter file on a local disk. Run it on exactly one machine. Make all instances of your program query that server for an individual sequence number (simply by connecting and reading one line). Using TCP sockets automatically makes sure that there can be only one server per network address and port. If you want to be paranoid, use the "lock the DATA handle" trick to prevent multiple instances.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re^5: Generate a unique ID
by sundialsvc4 (Abbot) on Nov 17, 2010 at 14:37 UTC

    I do not argue that a PRNG has a non-zero chance of repetition.   But I have found, if only through empirical observation, that a reasonable implementation of this strategy is a good pragmatic solution to the collision problem.

    In actual implementation, I would write a loop that attempts, say, ten times, to produce a random directory-name and to create a new directory having that name.   (If the loop failed to do so, each time, it would generate a warning message to STDERR, because even one actual collision would be, in my book, quite unexpected.)   The loop would not permit the directory to be used if it already existed (thus pushing the “atomicity problem” off to the file system).

    The odds of even one name-collision are extremely small; the odds of ten collisions in a row are almost-infinitely smaller.

    And once the program has acquired a temporary directory that is all its own, it can build whatever files it wants within that directory, and can do with them as it pleases.

    Upon termination, it destroys the directory and its content.

    I would probably add a short prefix to the random string, both to make it easier to recognize why a given directory-name is present in /tmp, and to simplify the process of removing them en masse.

      The odds of even one name-collision are extremely small;

      Do you consider a 1 in 30 chance as "extremely small"?

      >perl -E"$h{rand()}++ for 1..1e6; printf qq[prob: %.3f%%\n], (keys(%h) +/1e4)" prob: 3.277% >perl -E"$h{rand()}++ for 1..1e6; printf qq[prob: %.3f%%\n], (keys(%h) +/1e4)" prob: 3.277% >perl -E"$h{rand()}++ for 1..1e6; printf qq[prob: %.3f%%\n], (keys(%h) +/1e4)" prob: 3.277% >perl -E"$h{rand()}++ for 1..1e6; printf qq[prob: %.3f%%\n], (keys(%h) +/1e4)" prob: 3.277%

      What I've implemented is this (the code is to be part of an XS module):

      void makeDir( void ) { in t i = 10; do { sprintf( dir, "c:/tmp/MYAPP04x%04X/", GetCurrentProcessId(), GetTickCount64() & 0xffff ); --i || expire( -99999 ); } while( _mkdir( dir ) == ERROR_FILE_EXISTS ); GetLastError() && expire( - GetLastError() ); return; }

      GetTickCount64() returns the uptime in milliseconds. By truncating it to 16-bits it proves to be a better rand() than MS' CRT rand() :).

      The error codes are provisional!


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        a 1 in 30 chance
        That's just because your PRNG is only 15 bits:

        $ perl -e 'printf "%.3f%", 2**15/1e4' 3.277%
        With a decent PRNG, you'd have a much lower chance of collisions.

        Consider the following practical algorithm:

        sub random_string { my $self = shift; my $result = shift; $result = '' unless (defined($result)); my @letters = ('A'..'Z', 'a'..'z'); my @letternum = ('A'..'Z', 'a'..'z', '0'..'9'); # PRODUCES RANDOM ALPHANUMERIC IDENT 8 CHARS LONG, FIRST CHAR ALPHA. $result .= $letters[rand $#letters]; $result .= join "", map { $letternum[rand $#letternum] } 1..7; return $result; }

        One million repetitions later, the same string was never produced twice.   I am quite confident that, if I had ten or even a hundred times as much time to waste waiting for Godot to repeat himself, the result would have been exactly the same.   So, I think that it is quite defensible to say, “it ain’t nevah gonna” happen.   Once you have reduced the probability acceptably close to zero (and of course, have demonstrated in your test-suite that it is, in fact, robust), then ...

        “Well, that’s close enough to zero for peace work ...”

Re^5: Generate a unique ID
by Your Mother (Canon) on Nov 17, 2010 at 15:32 UTC

    I'm curious about the why/which in this. I just ran you test code several times and even put it to 1e9 and 1e8 which both ran out of memory before completing and I got no "Repeat" deaths. This is on a modern Linux box with Perl 5.8. What hardware/perl combination makes yours bomb out so early?

      I did mention the reason. MS' CRT rand() function uses 15-bits only.

      This also affects Perl directly because the perl's rand is implemented in terms of the crt function.

      I'd normally use Math::Random::MT for anything where I need a descent random, but the standard implementation (and therefore the Perl module) is not threadsafe due to some static internal buffers.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://871570]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (13)
As of 2014-09-30 13:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (369 votes), past polls