Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^3: Generate a unique ID

by sundialsvc4 (Abbot)
on Nov 15, 2010 at 19:58 UTC ( #871564=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Generate a unique ID
in thread Generate a unique ID

If you use a random identifier that is long enough, I do believe that it becomes irrelevant to seriously consider “collisions.”   You will have won the Lottery in every state and every country, and retired to a place where you do not have to give a tinker’s dam about computers, long before a collision actually occurs.

The sequence that you expect to succeed will be:   to create the directory, write some sentinel file into it, and verify that the sentinel file does exist.   If you can do all that, you’re good to go.

If you cannot create the directory, then I submit that it is safe to assume that the reason is “permissions.”   Even though meteors can fall from the sky and lodge themselves in your microwave oven just to the left of your turkey sandwich, you don’t need to test for them.


Comment on Re^3: Generate a unique ID
Re^4: Generate a unique ID
by BrowserUk (Pope) on Nov 15, 2010 at 20:19 UTC
    You will have won the Lottery in every state and every country,

    Most lotteries are won!

    The problem with a random number solution, is the quality of the random number generator:

    >perl -E"++$n, $h{rand()}++ and die qq[Repeat after $n iterations] for + 1 .. 1e6" Repeat after 110 iterations at -e line 1. >perl -E"++$n, $h{rand()}++ and die qq[Repeat after $n iterations] for + 1 .. 1e6" Repeat after 225 iterations at -e line 1. >perl -E"++$n, $h{rand()}++ and die qq[Repeat after $n iterations] for + 1 .. 1e6" Repeat after 115 iterations at -e line 1. >perl -E"++$n, $h{rand()}++ and die qq[Repeat after $n iterations] for + 1 .. 1e6" Repeat after 28 iterations at -e line 1.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      So, if you can't use (or don't trust) the random number generator, re-implement what an SQL sequence does. As long as your job is restricted to a single machine, and your OS supports at least advisory locks, that should not be too hard. This is very similar to a robust web page visitor counter script (one that does not damage the counter when called in parallel).

      You need a file that contains the current sequence number, all access to that file is protected by locks, so that at any time exactly one process can read and increment the sequence number. The thread Trying to understand flock contains some tips.

      If you have to work with different machines and networked filesystems (NFS, CIFS, AFS, ...), don't bet on working locks. Implement the sequence number generator as a dumb TCP/IP server on a high port (>1024), that can handle only one client. Use a (properly locked) counter file on a local disk. Run it on exactly one machine. Make all instances of your program query that server for an individual sequence number (simply by connecting and reading one line). Using TCP sockets automatically makes sure that there can be only one server per network address and port. If you want to be paranoid, use the "lock the DATA handle" trick to prevent multiple instances.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      I do not argue that a PRNG has a non-zero chance of repetition.   But I have found, if only through empirical observation, that a reasonable implementation of this strategy is a good pragmatic solution to the collision problem.

      In actual implementation, I would write a loop that attempts, say, ten times, to produce a random directory-name and to create a new directory having that name.   (If the loop failed to do so, each time, it would generate a warning message to STDERR, because even one actual collision would be, in my book, quite unexpected.)   The loop would not permit the directory to be used if it already existed (thus pushing the “atomicity problem” off to the file system).

      The odds of even one name-collision are extremely small; the odds of ten collisions in a row are almost-infinitely smaller.

      And once the program has acquired a temporary directory that is all its own, it can build whatever files it wants within that directory, and can do with them as it pleases.

      Upon termination, it destroys the directory and its content.

      I would probably add a short prefix to the random string, both to make it easier to recognize why a given directory-name is present in /tmp, and to simplify the process of removing them en masse.

        The odds of even one name-collision are extremely small;

        Do you consider a 1 in 30 chance as "extremely small"?

        >perl -E"$h{rand()}++ for 1..1e6; printf qq[prob: %.3f%%\n], (keys(%h) +/1e4)" prob: 3.277% >perl -E"$h{rand()}++ for 1..1e6; printf qq[prob: %.3f%%\n], (keys(%h) +/1e4)" prob: 3.277% >perl -E"$h{rand()}++ for 1..1e6; printf qq[prob: %.3f%%\n], (keys(%h) +/1e4)" prob: 3.277% >perl -E"$h{rand()}++ for 1..1e6; printf qq[prob: %.3f%%\n], (keys(%h) +/1e4)" prob: 3.277%

        What I've implemented is this (the code is to be part of an XS module):

        void makeDir( void ) { in t i = 10; do { sprintf( dir, "c:/tmp/MYAPP04x%04X/", GetCurrentProcessId(), GetTickCount64() & 0xffff ); --i || expire( -99999 ); } while( _mkdir( dir ) == ERROR_FILE_EXISTS ); GetLastError() && expire( - GetLastError() ); return; }

        GetTickCount64() returns the uptime in milliseconds. By truncating it to 16-bits it proves to be a better rand() than MS' CRT rand() :).

        The error codes are provisional!


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

      I'm curious about the why/which in this. I just ran you test code several times and even put it to 1e9 and 1e8 which both ran out of memory before completing and I got no "Repeat" deaths. This is on a modern Linux box with Perl 5.8. What hardware/perl combination makes yours bomb out so early?

        I did mention the reason. MS' CRT rand() function uses 15-bits only.

        This also affects Perl directly because the perl's rand is implemented in terms of the crt function.

        I'd normally use Math::Random::MT for anything where I need a descent random, but the standard implementation (and therefore the Perl module) is not threadsafe due to some static internal buffers.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://871564]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2014-12-20 19:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (97 votes), past polls