Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Generate a unique ID

by BrowserUk (Pope)
on Nov 15, 2010 at 09:06 UTC ( #871418=perlquestion: print w/ replies, xml ) Need Help??
BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

I'm looking to generate a unique ID for a given run of a program, that will distinguish it from any other--previous, subsequent or concurrent--run of that program on that same system.

I'm seeking to avoid semi-random algorithms like GUID/UUID and similar.

I'm not concerned with trying to detect deliberate attempts to fake out the mechanism.

I'm thinking to combine: process id with current date-time to the highest resoltion available. Is that enough?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Generate a unique ID
Re: Generate a unique ID
by Corion (Pope) on Nov 15, 2010 at 09:11 UTC

    I also include the program name (from $0) and the hostname, just in case I decide later on to spread out the program to multiple machines. In my case, I write to logfiles, and it is necessary to see on what machine a program was launched, and which program (or under which name).

    Other than these two items, no other attributes come to mind.

Re: Generate a unique ID
by happy.barney (Pilgrim) on Nov 15, 2010 at 09:17 UTC
    I think that is enough if you are sure that your program runs at least one "time unit" ... time => sleep (1), getimeofday => usleep (1)
Re: Generate a unique ID
by moritz (Cardinal) on Nov 15, 2010 at 09:26 UTC
    I'm thinking to combine: process id with current date-time to the highest resoltion available.

    That was my first idea too. Then I remembered that the Linux kernel has experimental support for PID namespaces, which means that it's possible to have arbitrary PIDs per user, and the same PID in two programs that are run by different users at the same time.

    If you don't care about such exotic cases, you should be fine.

Re: Generate a unique ID
by Ratazong (Prior) on Nov 15, 2010 at 09:33 UTC

    If your program will only run on one system you might also consider a counter stored in a file. Each time the program starts it will access the file, increase the counter and close it again. For handling concurrent runs of your program, you would just sleep(1) if/while your counter-file is locked for writing.

    If your programs are running on several systems you might use the IP/MAC-address in conjunction with the counter and have a counter-file on each system....

    HTH, Rata
Re: Generate a unique ID
by JavaFan (Canon) on Nov 15, 2010 at 10:05 UTC
    I'm looking to generate a unique ID for a given run of a program, that will distinguish it from any other--previous, subsequent or concurrent--run of that program on that same system.
    A counter seems like the most obvious choice. Since you all you care about is processes on a single system, keeping the counter in a file will work. Otherwise, I'd use a database, or a small CGI program to hand out counters.
    I'm thinking to combine: process id with current date-time to the highest resolution available. Is that enough?
    In practice, on an OS that assigns PIDs increasingly (wrapping around when a limit has reached), yes. In theory, you have a clash. And if you don't do the "date/time" properly, you'll have a potential problem around daylight saving times change-over.
Re: Generate a unique ID
by DrHyde (Prior) on Nov 15, 2010 at 10:31 UTC
    Why avoid UUIDs, when they do exactly what you've asked for and someone else has already written and tested both the algorithm and the code? Data::UUID is the correct answer, unless you have some other constraint that you're not telling us about.

      The basic one is that with any mechanism that has random numbers at its core, there is no guarantee of uniqueness, despite what Data::UUID POD says. (If you look up the RFC it uses the phrase: "is either guaranteed to be different from all other UUIDs/GUIDs generated until 3400 A.D. or extremely likely to be different".

      The probability of collision might be very small, but it still exists. It would therefore be necessary to record enough information to disambiguate between chance collisions. And that means recording all the information that went into generating the number in the first place. As I wouldn't have access to all the information, that isn't possible.

      A second reason is that the timestamping mechanisms used by Data::UUID are broken.

      1. On Windows: It uses QueryPerformanceCounter() api to get a timestamp to the required 100 nanosecond resolution.

        But this high resolution elapsed time counter is known to drift with respect to the system clock, but the module make no attempt at corrections.

        See Time::HiRes for its correction mechanism.

      2. On other systems: it uses GetSystemTime(2).

        But that api is limited to microsecond resolution, so it just multiplies by 10.

      While the spec calls for a once-only randomly initialised, thereafter monotically increasing, sequence component, I can see no provision for storing/retrieving this on windows systems.

      The output of the true_random() function is suspect on systems where rand()is limited to 15 bits.

      As my needs are limited to a single system, using a deterministic value with a self-consistent, high resolution time component will suffice.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        I would say that your chances of collision is very, very, very small, even on Windows.

        I wrote a benchmark once (using Benchmark) that used GUID and UUID (from UUID, not Data::UUID) to generate session keys for me. I ran it through 100k iterations on my windows box and it generated fast (1.1 seconds) and produced 0 collisions. I even had a variant that would cut the result to make the string 16 characters (substr(0, 16)) to save storage space. 0 Collisions in 1.1 seconds.

        I just cut it further to 8 characters and it still produced no collisions.

        Sure installing UUID is a pain on windows, and the code for using it is ugly, but I don't see a reason not to use it.

Re: Generate a unique ID
by GrandFather (Cardinal) on Nov 15, 2010 at 10:58 UTC

    It's probably not relevant, but we have seen time "run backwards" on multi-core systems using AMD processors. Such unusual events have the ability to play hob with any scheme that expects monotonically increasing time and can be surprisingly difficult to debug - some things do happen "once in a blue moon".

    We've only seen this happen at micro-second time scales so I doubt very much it is a concern for your application, but it could be if the same technique were used to generate ids across threads.

    True laziness is hard work
      It's probably not relevant, but we have seen time "run backwards" on multi-core systems using AMD processors.

      Was this using Time::HiRes on a Windows system? If so, I know how that can happen, and I don't think it is constrained to AMD processors, though I seem to remember seeing that Intel have addressed the underlying problem on their latest chips.

      For my purposes, I'll be probably be using a time value limited to milliseconds anyway, but thanks for the heads up.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        It was actually in a C++ application using the high res 'multi media' timer (IIRC) in an application trying to sync clocks on external hardware. The information we found on the web implicated just AMD processors and was consistent with our observations, albeit with a fairly small sample set. I wasn't directly involved with the problem so I don't remember the details, but it was within the last two months and the computers involved were probably less than a year old.

        True laziness is hard work
Re: Generate a unique ID
by LanX (Canon) on Nov 15, 2010 at 13:20 UTC
    > I'm thinking to combine: process id with current date-time to the highest resoltion available. Is that enough?

    If you're not absolutely sure if it's enough, then you should consider some dynamic error detecting mechanism.

    For instance add a "sufficiently" long random number after a delimiter like underscore.

    If ever the "head" part in front of the delimiter should be identical while the random "tail" part is different you can raise an error message. The length of the random part will decide about the probability to detect such a problem.

    Of course I second Corions suggestion to add a host part in the ID-head.

    Cheers Rolf

      The length of the random part will decide about the probability to detect such a problem.

      Whilst you're right about the probability, the question still arises, do you try to detect the problem?

      Ie. Would you be happy with a sort utility that had the possibility to produce wrong results occasionally?

      If so, what probability would you accept?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        You missed the point:

        Detecting the existence of a problem gives you the chance to improve the algorithm that you expected to be perfect!

        Think about the recent problems with Rolls Royce jet engines in the Airbus A380. The redundancy of this airplane - it can still safely fly with only 2 of 4 engines working - helped discovering that this totally unexpected problem exists. Now the engineers can either try to improve those engines or Airbus can switch to another supplier.

        But without the redundancy they would have no clue now, why the airbus might have crashed.

        Absolute security is an illusion.

        Cheers Rolf

Re: Generate a unique ID
by sundialsvc4 (Abbot) on Nov 15, 2010 at 13:24 UTC

    I think you have three choices:

    1. Timestamps, if you know that all clocks are synchronized.
    2. Serially incrementing numbers, if you have a known interlocking mechanism.
    3. Pseudo-random numbers, such as UUIDs, which are, after all, designed for just this purpose.   (You can, as I have opined before, prepend a human-readable string to these, without compromising their randomness.)

    Consider whether the “unique identifier” would benefit from having some “human readable” content, such as a timestamp would have ... whether the unique id needs to be “unique across systems” ... what your practical timing considerations are.

Re: Generate a unique ID
by afoken (Parson) on Nov 15, 2010 at 15:39 UTC

    If you have a RDBMS nearby, delegate the problem to the database. Create a sequence and fetch a value from it for each running instance. Sure, it looks like overkill, but all major RDBMS have solved all of those nasty litte problems (like parallel access from several systems, atomic updates of the sequence value, running on multiple CPUs in parallel, locking ...).

    Actually, I wonder why you would need that unique ID outside a database.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      The application requires temporary spill files that need to have unique names that prevent one copy of the program picking up the wrong files. Whether they've been left lying around from a previous aborted run, or because two copies are running concurrently.

      Installing a DB just for this purpose is not a viable option.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        that prevent one copy of the program picking up the wrong files

        I solved this once using "semaphore files", if a copy of a program picks up a file it checks if the semaphore is there. If it is, it skips the file, if not it creates one. After processing the file it is removed/moved to another directory and the semaphore file deleted. You need some bookkeeping to make it safe.

        It's not clear to me why you need the "uniqueness" at all? If you don't want the slightest chance of a collision I think only the "lastId + 1" remains. You need some sort of locking mechanism of course to control access to the value, e.g. prevent a dirty read. Both timestamps and UUID-like things are not 100% safe. On the other hand, reading through UUID, I would say the chance is probably too small to bother.

        Cheers

        Harry

        The application requires temporary spill files that need to have unique names

        Is there a reason why you can't use File::Temp?

Re: Generate a unique ID
by sundialsvc4 (Abbot) on Nov 15, 2010 at 18:21 UTC

    As you further describe this as being related to “spill files,” if it is possible to do so (and if this is not already what you are doing), it seems to me that it would be a good idea to set up a randomly-named “spill directory.”   Put all of the spill-files, no matter what they may be named, into that.   When the process ends, it likewise tries to clean up “such-and-such directory, no matter what it contains.”

    This might be a much less pervasive change ... requiring less code in the various programs to be considered and touched as you are implementing this change.   You no longer care about the name of the toys upon the floor; only the name of the playpen.

      set up a randomly-named “spill directory.”

      Using a directory name instead of a prefix is a good notion.

      mkdir returns false if the directory already exists and is atomic. Problem is, how do you distinguish between failure to create because it exists, and failure to create for some other reason? Eg. permissions.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        If you use a random identifier that is long enough, I do believe that it becomes irrelevant to seriously consider “collisions.”   You will have won the Lottery in every state and every country, and retired to a place where you do not have to give a tinker’s dam about computers, long before a collision actually occurs.

        The sequence that you expect to succeed will be:   to create the directory, write some sentinel file into it, and verify that the sentinel file does exist.   If you can do all that, you’re good to go.

        If you cannot create the directory, then I submit that it is safe to assume that the reason is “permissions.”   Even though meteors can fall from the sky and lodge themselves in your microwave oven just to the left of your turkey sandwich, you don’t need to test for them.

        Problem is, how do you distinguish between failure to create because it exists, and failure to create for some other reason?
        You'd look at the error code. If the directory exists, the error will be EEXIST, and if it fails for some other reason, the error will be different.

        So far it looks for me like you trying to reinvent the wheel. You need to create directory with unique name?

        $dir = File::Temp->newdir;

        That's it. If you want, you can specify template for the directory name which will include timestamp or whatever you want. If it return failure, it's for some other reason.

        use -X
        unless (mkdir $dir) { -d $dir && ... dir exists; -w _ || ... not writable by euid -x _ || ... not traversable by euid }
        if you look for atomic test, you can also open files with O_CREAT | O_EXCL. Then, you can just use pid + start time + local counter
        $fh = IO::File->new ($filename, O_EXCL | O_CREAT | ...)
Re: Generate a unique ID
by TheRatKing (Initiate) on Nov 15, 2010 at 19:18 UTC
    I have databases with over 750k entries using UUID for each row and have never ran into a collision -- over a 2 year period. Are you storing this unique ID somewhere? Otherwise why would the same ID matter? If you are storing this ID, then you may be able to just check a newly generated UUID against your database and generate a new one if collision does occur. But you will find that part of your code will be practically dead code anyway because it will not find any collision. So don't bother. Just use UUID, go on to implement something cool first. And if UUIDs are colliding because your program is so popular it is being used millions times per second, then come back to solve this problem.
      I have databases with over 750k entries using UUID for each row and have never ran into a collision -- over a 2 year period.

      How do you know you've never had a collision? Are you checking, if so how?

      And are you using Data::UUID? On Windows?

      If you're using this to give every new record a unique id, why aren't you using an autoincrmenting primary key?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Yes, I do check for collisions because I was also curious.

        I'm not using an autoincrementing primary key because an autoincrementing primary key depends on the state of the table within a given database. My data sits in Google App Engine for production and sits in a postgresql database as the primary source for when I need to curate it. (Processing directly on G.A.E. costs more than transferring the difference periodically.)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://871418]
Approved by Old_Gray_Bear
Front-paged by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (11)
As of 2014-10-23 13:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (125 votes), past polls