http://www.perlmonks.org?node_id=871561


in reply to Re: Generate a unique ID
in thread Generate a unique ID

set up a randomly-named “spill directory.”

Using a directory name instead of a prefix is a good notion.

mkdir returns false if the directory already exists and is atomic. Problem is, how do you distinguish between failure to create because it exists, and failure to create for some other reason? Eg. permissions.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^3: Generate a unique ID
by sundialsvc4 (Abbot) on Nov 15, 2010 at 19:58 UTC

    If you use a random identifier that is long enough, I do believe that it becomes irrelevant to seriously consider “collisions.”   You will have won the Lottery in every state and every country, and retired to a place where you do not have to give a tinker’s dam about computers, long before a collision actually occurs.

    The sequence that you expect to succeed will be:   to create the directory, write some sentinel file into it, and verify that the sentinel file does exist.   If you can do all that, you’re good to go.

    If you cannot create the directory, then I submit that it is safe to assume that the reason is “permissions.”   Even though meteors can fall from the sky and lodge themselves in your microwave oven just to the left of your turkey sandwich, you don’t need to test for them.

      You will have won the Lottery in every state and every country,

      Most lotteries are won!

      The problem with a random number solution, is the quality of the random number generator:

      >perl -E"++$n, $h{rand()}++ and die qq[Repeat after $n iterations] for + 1 .. 1e6" Repeat after 110 iterations at -e line 1. >perl -E"++$n, $h{rand()}++ and die qq[Repeat after $n iterations] for + 1 .. 1e6" Repeat after 225 iterations at -e line 1. >perl -E"++$n, $h{rand()}++ and die qq[Repeat after $n iterations] for + 1 .. 1e6" Repeat after 115 iterations at -e line 1. >perl -E"++$n, $h{rand()}++ and die qq[Repeat after $n iterations] for + 1 .. 1e6" Repeat after 28 iterations at -e line 1.

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        So, if you can't use (or don't trust) the random number generator, re-implement what an SQL sequence does. As long as your job is restricted to a single machine, and your OS supports at least advisory locks, that should not be too hard. This is very similar to a robust web page visitor counter script (one that does not damage the counter when called in parallel).

        You need a file that contains the current sequence number, all access to that file is protected by locks, so that at any time exactly one process can read and increment the sequence number. The thread Trying to understand flock contains some tips.

        If you have to work with different machines and networked filesystems (NFS, CIFS, AFS, ...), don't bet on working locks. Implement the sequence number generator as a dumb TCP/IP server on a high port (>1024), that can handle only one client. Use a (properly locked) counter file on a local disk. Run it on exactly one machine. Make all instances of your program query that server for an individual sequence number (simply by connecting and reading one line). Using TCP sockets automatically makes sure that there can be only one server per network address and port. If you want to be paranoid, use the "lock the DATA handle" trick to prevent multiple instances.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        I do not argue that a PRNG has a non-zero chance of repetition.   But I have found, if only through empirical observation, that a reasonable implementation of this strategy is a good pragmatic solution to the collision problem.

        In actual implementation, I would write a loop that attempts, say, ten times, to produce a random directory-name and to create a new directory having that name.   (If the loop failed to do so, each time, it would generate a warning message to STDERR, because even one actual collision would be, in my book, quite unexpected.)   The loop would not permit the directory to be used if it already existed (thus pushing the “atomicity problem” off to the file system).

        The odds of even one name-collision are extremely small; the odds of ten collisions in a row are almost-infinitely smaller.

        And once the program has acquired a temporary directory that is all its own, it can build whatever files it wants within that directory, and can do with them as it pleases.

        Upon termination, it destroys the directory and its content.

        I would probably add a short prefix to the random string, both to make it easier to recognize why a given directory-name is present in /tmp, and to simplify the process of removing them en masse.

        I'm curious about the why/which in this. I just ran you test code several times and even put it to 1e9 and 1e8 which both ran out of memory before completing and I got no "Repeat" deaths. This is on a modern Linux box with Perl 5.8. What hardware/perl combination makes yours bomb out so early?

Re^3: Generate a unique ID
by JavaFan (Canon) on Nov 15, 2010 at 20:35 UTC
    Problem is, how do you distinguish between failure to create because it exists, and failure to create for some other reason?
    You'd look at the error code. If the directory exists, the error will be EEXIST, and if it fails for some other reason, the error will be different.
Re^3: Generate a unique ID
by zwon (Abbot) on Nov 16, 2010 at 00:44 UTC

    So far it looks for me like you trying to reinvent the wheel. You need to create directory with unique name?

    $dir = File::Temp->newdir;

    That's it. If you want, you can specify template for the directory name which will include timestamp or whatever you want. If it return failure, it's for some other reason.

      I started out looking to create unique filenames, that during a later stage of the processing, I could differentiate from those created by other runs of the code, via globbing, .

      Once sundialsvc4 suggested using a directory--which hadn't crossed my mind earlier--using a temporary directory and creating all my files within it make perfect sense.

      I'm not sure that module buy me anything useful over something like:

      my $path = $ENV{ TMP }; my $dir = 'temp' . join'', map{ ('A'..'Z')[rand 26] } 1 .. 4; ++$dir until mkdir $path . $dir or $! != 17 and die "$path$dir : $!;

      But I guess that depends whether this code will ever make it to another platform. I don't see that ever happening.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: Generate a unique ID
by happy.barney (Friar) on Nov 16, 2010 at 08:09 UTC
    use -X
    unless (mkdir $dir) { -d $dir && ... dir exists; -w _ || ... not writable by euid -x _ || ... not traversable by euid }
    if you look for atomic test, you can also open files with O_CREAT | O_EXCL. Then, you can just use pid + start time + local counter
    $fh = IO::File->new ($filename, O_EXCL | O_CREAT | ...)
      use -X

      -X uses stat and that takes quite a long time, especially on Win32. It leaves the possibility that between your failed attempt and the return of the test, another program (or another copy of this program), will get in and create the directory.

      But as javafan pointed out, that isn't necessary. $! tells me the reason for failure. See Re^4: Generate a unique ID.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.