method of ID'ing

Parham has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: method of ID'ing by Juerd (Abbot) on Apr 13, 2002 at 18:10 UTC
$$ which i'm thinking can be reset at one point or another. As long as you're on a system that uses incremental process ID's, you will probably be safe. Process ID's cycle when the maximum has been reached (I think it's 65535 on my system) when they're incremental, but forking that often in a single second is very unlikely. However, Not all PIDs are simple incremental. Some are randomly chosen, and in that case, especially with short running scripts, you have a greater chance of having two identical IDs. My concern doesn't deal with $^T It contains the start time of the program (read: interpreter), which can cause problems if your interpreter is a long running interpreter like mod_perl or one of the many fast-perl-CGI things that avoid forking interpreters. Better is to use time, which returns the current time. - Yes, I reinvent wheels. - Spam: Visit eurotraQ.	[reply]
Re: Re: method of ID'ing by Molt (Chaplain) on Apr 13, 2002 at 19:10 UTC
Good point on the $^T bit, I missed that. I'd actually think it was more of a problem with a short-lived task though such as a CGI. For example, if a CGI took a maximum of ten seconds to run then there'd only be ten possible values, increasing the possibility of a clash? On another note you may hit $$ problems if the same process wants multiple IDs, but by this point you'd be wise to look at something like class::singleton to guarantee a single-point ID manager.	[reply]
Re: method of ID'ing by tachyon (Chancellor) on Apr 13, 2002 at 19:37 UTC
`use MD5; my $MD5 = new MD5(); $MD5->add( $$ . rand() . time() ); my $id = $MD5->hexdigest;` [download] cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l]
Re: Re: method of ID'ing by ehdonhon (Curate) on Apr 13, 2002 at 21:29 UTC
Does MD5 have a one to one relationship between the plaintext and the cyphertext? In other words, is it impossible for two different strings to map to the same MD5 string? If not, you might be introducing a potential for collisions by using it.	[reply]
Re: Re: Re: method of ID'ing by tachyon (Chancellor) on Apr 13, 2002 at 22:11 UTC
So they say. Check out the unofficial MD5 homepage here or read the full RFC1321 cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply]
Re: Re: Re: Re: method of ID'ing by ariels (Curate) on Apr 14, 2002 at 10:25 UTC
Re: Re: Re: Re: Re: method of ID'ing by tachyon (Chancellor) on Apr 14, 2002 at 12:41 UTC
Re: Re: Re: method of ID'ing by ilcylic (Scribe) on Apr 15, 2002 at 01:01 UTC
It has a one to (one of 2 to the 128th) possible values. Since the output domain of MD5 is limited to a 128 bit string, it is possible for more than one value to map to the same output value. It is a very small chance that two of the given inputs would ever map to the same string (unless there were a statistically significant percentage of 2^128 worth of entries) and even if there were, I don't believe this code is being used for something which is intended to be mission critical. Another issue to consider with MD5 is that the input value needs to be fairly large, if you're using it for 'important' purposes. Since MD5 operates on strings of size evenly divisible by 512, and pads otherwise, it's important to make sure you have at least one full block, to retain computational protection. Hope that helped. -il cylic	[reply]
Re: method of ID'ing by gav^ (Curate) on Apr 13, 2002 at 21:12 UTC
As stolen from Apache::Session: `$id = MD5->hexhash(MD5->hexhash(time.{}.rand().$$));` [download] gav^	[reply] [d/l]
Re: method of ID'ing by Molt (Chaplain) on Apr 13, 2002 at 18:09 UTC
It all depends on how quickly you're recycling numbers, I think. $$ will be reset at some point, this is true, but I seriously doubt it'll ever get reset and back to it's initial value within the one second timeframe needed to stop this being a unique ID. I guess that if you want to be truly paranoid you could look into how the better-coded hit counters work and use that kind of file handling to manage your ID, I think this should work in any realistic situation though.	[reply]
Re: method of ID'ing by blakem (Monsignor) on Apr 14, 2002 at 12:57 UTC
One thing not mentioned yet is that the amount of "uniqueness" contained in $$ drops significantly when you scale beyond a single webserver. If you have a group of load balanced webservers, you no longer have to roll-the-pid to get duplicate values of $$. Even if your whole project is running on a single machine today, a simple timestamp+pid identifier hampers the long term scalability of your site. -Blake	[reply]
Re: Re: method of ID'ing by Juerd (Abbot) on Apr 14, 2002 at 13:14 UTC
webserver. If you're using Apache and have mod_unique_id, you can use $ENV{UNIQUE_ID}, which I like a lot. The link above links to the module documentation, a page that also has detailed information about IDing techniques. - Yes, I reinvent wheels. - Spam: Visit eurotraQ.	[reply]
Re: method of ID'ing by tmiklas (Hermit) on Apr 14, 2002 at 10:13 UTC
I think it's ok if you have less than ~64k or ~32k requests per second (and of course you have to be able to answer to all those requests really FAST) ;-) It depends on the system you use... PID counter rolls over after some value (~32k or ~64k or other - you have to check it). I dont think that i'll ever have such traffic on my sites ;) so i use this method frequently. Hmmm... how about using Time::HiRes to increase precision of `$^T` (in reasonable cases of course)?! Greetz, Tom.	[reply]
Re: Re: method of ID'ing by Juerd (Abbot) on Apr 14, 2002 at 10:34 UTC
Hmmm... how about using Time::HiRes to increase precision of $^T (in reasonable cases of course)?! $^T is set before Time::HiRes can be loaded, so it won't make a difference. $^T is not a magic variable that issues time, it is set when the interpreter starts (which can cause a lot of trouble when running under mod_perl, irssi, or any other long term perl embedder). Don't use $^T for IDing purposes, use time instead. To update existing scripts (but it might break some that depend on $^T to not change), you could use: `package Tie::Time; use Carp; use strict; sub TIESCALAR { bless \my $dummy, shift } sub STORE { croak 'Cannot set time this way' } sub FETCH { time } =head1 NAME Tie::Time - Have a scalar return the current time() =head1 SYNOPSIS tie my $time, 'Tie::Time'; # New variable tie $^T, 'Tie::Time'; # Override existing $^T =head1 DESCRIPTION Guess :) =head1 URL http://perlmonks.org/?node_id=158912 =cut` [download] - Yes, I reinvent wheels. - Spam: Visit eurotraQ.	[reply] [d/l]
Re: Re: Re: method of ID'ing by particle (Vicar) on Apr 14, 2002 at 13:27 UTC
how about adding `require Tie::Scalar`, and throwing this in the code catacombs? this is a nice, simple answer to retrieving the current time. ~Particle ;�	[reply] [d/l]
Re: Re: Re: Re: method of ID'ing by Juerd (Abbot) on Apr 14, 2002 at 13:48 UTC
Re^5: method of ID'ing by particle (Vicar) on Apr 14, 2002 at 15:25 UTC
Re: Re: Re: method of ID'ing by tmiklas (Hermit) on Apr 14, 2002 at 17:09 UTC
As i see your answers always get to the point... ;-) I've never used `$^T` for this task - always time(). Besides for some time i use `unique_id` provided with Apache ;-) and from this point have almost nothing to worry about ;-). Greetz, Tom.	[reply]
Re: method of ID'ing by roboslug (Sexton) on Apr 15, 2002 at 04:28 UTC
Time is an illusion. Lunchtime doubly So. - Douglas Adams Parham, I used a similar method on a network daemon and it worked very well until one day a "backup" time server was put into place that was not set to the right time. All of the nodes running the daemon migrated to this new time server because it more correctly matched their (PST time, not PDT time) and next thing you know, IDs are getting re-used and all hell breaks loose. After examining time sync protocols, I also think there may be some error margin at startup, where the time fluctuates up and down as it adjusts to match time server. I could be wrong here. So, my notes about unique IDs are as follows: * If you plan to use time(), use Time::HiRes instead. It provides more uniqueness and also seems to execute faster than time(). * If you hash (ie., MD5), I would use SHA1 instead and remember to add buffer. I really see little reason to hash unless you prefer the string format of a hash. I avoid hashing when it isn't necessary due to the calculation time involved. * Add an internal increment...sorry, only way I could figure out how to deal with time "slipping". After $inc == MAXINC, reset so you don't get absurdly long numbers over time. Store the $inc to a file if you need to maintain persistence or allow other instances to grab it. Load $inc; $inc++; Save $inc. Remember to flock. * If you want to make it survive distributed systems, (load balanced or whatever), attach a hostname, IP, or Mac Address. Mac Address will protect you from "admin" mistakes. * Random is an ok thing to add to your string, but you shouldn't need it and since it is only "somewhat" random, doesn't help much more than time+PID+inc. * And/or if you really want to make sure nothing "bad" happens, store the ID and do a check. A quick way to do this is to make a file in /tmp or similar purpose area and do something like: `do { [ generate ID code ] } while (-e $ID) [ create empty /tmp/$ID file ]` [download] Of course, this gets slow after thousands of IDs have been generated, so be sure to clean house in some fashion as well.	[reply] [d/l]
Re: Re: method of ID'ing by Juerd (Abbot) on Apr 15, 2002 at 07:36 UTC
* If you plan to use time(), use Time::HiRes instead. It provides more uniqueness and also seems to execute faster than time(). Time::HiRes::time indeed provides more uniqueness, but it is not faster: `Benchmark: running Time::HiRes::time, time, each for at least 1 CPU se +conds... Time::HiRes::time: 2 wallclock secs ( 0.82 usr + 0.22 sys = 1.04 CP +U) @ 1071260.58/s (n=1114111) time: 0 wallclock secs ( 0.70 usr + 0.31 sys = 1.01 CPU) @ 18 +16838.61/s (n=1835007) Rate Time::HiRes::time time Time::HiRes::time 1071261/s -- -41% time 1816839/s 70% --` [download] sorry, only way I could figure out how to deal with time "slipping". After $inc == MAXINC Try the modulo operator `%`. Example increments: `($counter += 1) %= 5; # 0, 1, 2, 3, 4, 0, 1, 2..4, 0..4, 0..4, ... ($counter += 1) %= 256; # 0..255, 0..255, ...` [download] - Yes, I reinvent wheels. - Spam: Visit eurotraQ.	[reply] [d/l] [select]
Re: Re: Re: method of ID'ing by Anonymous Monk on Apr 15, 2002 at 10:16 UTC
Actually, you and I are both correct. ;-) What I forgot was that the time I benchmarked it, it was under NT. I just did a benchmark on NT and Linux and got the following results: NT: Time::HiRes::time() - timethis 600000: 20 wallclock secs (19.99 usr + 0.00 sys= 19.99 CPU) @ 30015.01/s (n=600000) time() - timethis 600000: 67 wallclock secs (66.73 usr + 0.00 sys = 66.73 CPU) @ 8991.46/s (n=600000) Linux: Time::HiRes::time() - timethis 600000: 3 wallclock secs ( 1.22 usr + 0.24 sys = 1.46 CPU) time() - timethis 600000: 1 wallclock secs ( 0.27 usr + 0.17 sys = 0.44 CPU) Anyway, enough of the thread hijacking. The modulo operator is a great idea, good suggestion.	[reply]
Re: method of ID'ing by kappa (Chaplain) on Apr 16, 2002 at 15:03 UTC
Just a little comment about hashes. I want to second roboslug and say that there's no use in using cryptographically-strong hash unless there're chances for your users to enter ID (think session ID in URL). These functions are usually computationally-hard (and usually by design) and add nothing from the point of randomness (and therefore uniqueness) to your ID. But beware of users guessing your time()-based IDs in URLs (or even cookies).	[reply]


Perl: the Markov chain saw
	PerlMonks