Re: method of ID'ing
by Juerd (Abbot) on Apr 13, 2002 at 18:10 UTC
|
$$ which i'm thinking can be reset at one point or another.
As long as you're on a system that uses incremental process ID's, you will probably be safe. Process ID's cycle when the maximum has been reached (I think it's 65535 on my system) when they're incremental, but forking that often in a single second is very unlikely.
However, Not all PIDs are simple incremental. Some are randomly chosen, and in that case, especially with short running scripts, you have a greater chance of having two identical IDs.
My concern doesn't deal with $^T
It contains the start time of the program (read: interpreter), which can cause problems if your interpreter is a long running interpreter like mod_perl or one of the many fast-perl-CGI things that avoid forking interpreters. Better is to use time, which returns the current time.
- Yes, I reinvent wheels.
- Spam: Visit eurotraQ.
| [reply] [Watch: Dir/Any] |
|
Good point on the $^T bit, I missed that. I'd actually think it was more of a problem with a short-lived task though such as a CGI. For example, if a CGI took a maximum of ten seconds to run then there'd only be ten possible values, increasing the possibility of a clash?
On another note you may hit $$ problems if the same process wants multiple IDs, but by this point you'd be wise to look at something like class::singleton to guarantee a single-point ID manager.
| [reply] [Watch: Dir/Any] |
Re: method of ID'ing
by tachyon (Chancellor) on Apr 13, 2002 at 19:37 UTC
|
use MD5;
my $MD5 = new MD5();
$MD5->add( $$ . rand() . time() );
my $id = $MD5->hexdigest;
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
| [reply] [Watch: Dir/Any] [d/l] |
|
Does MD5 have a one to one relationship between the
plaintext and the cyphertext? In other words, is it
impossible for two different strings to map to the same MD5
string? If not, you might be
introducing a potential for collisions by using it.
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
|
|
|
It has a one to (one of 2 to the 128th) possible values. Since the output domain of MD5 is limited to a 128 bit string, it is possible for more than one value to map to the same output value. It is a very small chance that two of the given inputs would ever map to the same string (unless there were a statistically significant percentage of 2^128 worth of entries) and even if there were, I don't believe this code is being used for something which is intended to be mission critical.
Another issue to consider with MD5 is that the input value needs to be fairly large, if you're using it for 'important' purposes. Since MD5 operates on strings of size evenly divisible by 512, and pads otherwise, it's important to make sure you have at least one full block, to retain computational protection.
Hope that helped.
-il cylic
| [reply] [Watch: Dir/Any] |
Re: method of ID'ing
by gav^ (Curate) on Apr 13, 2002 at 21:12 UTC
|
$id = MD5->hexhash(MD5->hexhash(time.{}.rand().$$));
gav^ | [reply] [Watch: Dir/Any] [d/l] |
Re: method of ID'ing
by Molt (Chaplain) on Apr 13, 2002 at 18:09 UTC
|
It all depends on how quickly you're recycling numbers, I think. $$ will be reset at some point, this is true, but I seriously doubt it'll ever get reset and back to it's initial value within the one second timeframe needed to stop this being a unique ID.
I guess that if you want to be truly paranoid you could look into how the better-coded hit counters work and use that kind of file handling to manage your ID, I think this should work in any realistic situation though.
| [reply] [Watch: Dir/Any] |
Re: method of ID'ing
by blakem (Monsignor) on Apr 14, 2002 at 12:57 UTC
|
One thing not mentioned yet is that the amount of "uniqueness" contained in $$ drops significantly when you scale beyond a single webserver. If you have a group of load balanced
webservers, you no longer have to roll-the-pid to get
duplicate values of $$.
Even if your whole project is running on a single machine today, a simple timestamp+pid identifier hampers the long term
scalability of your site.
-Blake
| [reply] [Watch: Dir/Any] |
|
webserver.
If you're using Apache and have mod_unique_id, you can use $ENV{UNIQUE_ID}, which I like a lot.
The link above links to the module documentation, a page that also has detailed information about IDing techniques.
- Yes, I reinvent wheels.
- Spam: Visit eurotraQ.
| [reply] [Watch: Dir/Any] |
Re: method of ID'ing
by tmiklas (Hermit) on Apr 14, 2002 at 10:13 UTC
|
I think it's ok if you have less than ~64k or ~32k requests per second (and of course you have to be able to answer to all those requests really FAST) ;-) It depends on the system you use... PID counter rolls over after some value (~32k or ~64k or other - you have to check it). I dont think that i'll ever have such traffic on my sites ;) so i use this method frequently. Hmmm... how about using Time::HiRes to increase precision of $^T (in reasonable cases of course)?!
Greetz, Tom. | [reply] [Watch: Dir/Any] |
|
Hmmm... how about using Time::HiRes to increase precision of $^T (in reasonable cases of course)?!
$^T is set before Time::HiRes can be loaded, so it won't make a difference. $^T is not a magic variable that issues time, it is set when the interpreter starts (which can cause a lot of trouble when running under mod_perl, irssi, or any other long term perl embedder).
Don't use $^T for IDing purposes, use time instead. To update existing scripts (but it might break some that depend on $^T to not change), you could use:
package Tie::Time;
use Carp;
use strict;
sub TIESCALAR { bless \my $dummy, shift }
sub STORE { croak 'Cannot set time this way' }
sub FETCH { time }
=head1 NAME
Tie::Time - Have a scalar return the current time()
=head1 SYNOPSIS
tie my $time, 'Tie::Time'; # New variable
tie $^T, 'Tie::Time'; # Override existing $^T
=head1 DESCRIPTION
Guess :)
=head1 URL
http://perlmonks.org/?node_id=158912
=cut
- Yes, I reinvent wheels.
- Spam: Visit eurotraQ.
| [reply] [Watch: Dir/Any] [d/l] |
|
how about adding require Tie::Scalar, and throwing this in the code catacombs? this is a nice, simple answer to retrieving the current time.
~Particle ;Þ
| [reply] [Watch: Dir/Any] [d/l] |
|
|
|
As i see your answers always get to the point... ;-) I've never used $^T for this task - always time(). Besides for some time i use unique_id provided with Apache ;-) and from this point have almost nothing to worry about ;-).
Greetz, Tom.
| [reply] [Watch: Dir/Any] |
Re: method of ID'ing
by roboslug (Sexton) on Apr 15, 2002 at 04:28 UTC
|
Time is an illusion. Lunchtime doubly So.
- Douglas Adams
Parham,
I used a similar method on a network daemon and it worked very well
until one day a "backup" time server was put into place that was
not set to the right time. All of the nodes running the daemon migrated
to this new time server because it more correctly matched their
(PST time, not PDT time) and next thing you know, IDs are getting re-used and all
hell breaks loose.
After examining time sync protocols, I also think there may be some error margin at startup, where
the time fluctuates up and down as it adjusts to match time server. I could be wrong here.
So, my notes about unique IDs are as follows:
* If you plan to use time(), use Time::HiRes instead. It provides more uniqueness and
also seems to execute faster than time().
* If you hash (ie., MD5), I would use SHA1 instead and remember to add buffer. I really see
little reason to hash unless you prefer the string format of a hash. I avoid hashing when it isn't
necessary due to the calculation time involved.
* Add an internal increment...sorry, only way I could figure out how to deal with time
"slipping". After $inc == MAXINC, reset so you don't get absurdly long numbers over
time. Store the $inc to a file if you need to maintain persistence or allow other
instances to grab it. Load $inc; $inc++; Save $inc. Remember to flock.
* If you want to make it survive distributed systems, (load balanced or whatever), attach a
hostname, IP, or Mac Address. Mac Address will protect you from "admin" mistakes.
* Random is an ok thing to add to your string, but you shouldn't need it and since it is
only "somewhat" random, doesn't help much more than time+PID+inc.
* And/or if you really want to make sure nothing "bad" happens, store the ID and do a check. A
quick way to do this is to make a file in /tmp or similar purpose area and do something like:
do {
[ generate ID code ]
} while (-e $ID)
[ create empty /tmp/$ID file ]
Of course, this gets slow after thousands of IDs have been generated, so be sure
to clean house in some fashion as well. | [reply] [Watch: Dir/Any] [d/l] |
|
Benchmark: running Time::HiRes::time, time, each for at least 1 CPU se
+conds...
Time::HiRes::time: 2 wallclock secs ( 0.82 usr + 0.22 sys = 1.04 CP
+U) @ 1071260.58/s (n=1114111)
time: 0 wallclock secs ( 0.70 usr + 0.31 sys = 1.01 CPU) @ 18
+16838.61/s (n=1835007)
Rate Time::HiRes::time time
Time::HiRes::time 1071261/s -- -41%
time 1816839/s 70% --
sorry, only way I could figure out how to deal with time "slipping". After $inc == MAXINC
Try the modulo operator %. Example increments:
($counter += 1) %= 5; # 0, 1, 2, 3, 4, 0, 1, 2..4, 0..4, 0..4, ...
($counter += 1) %= 256; # 0..255, 0..255, ...
- Yes, I reinvent wheels.
- Spam: Visit eurotraQ.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Actually, you and I are both correct. ;-) What I forgot was that
the time I benchmarked it, it was under NT. I just did a
benchmark on NT and Linux and got the following results:
NT:
Time::HiRes::time() - timethis 600000: 20 wallclock secs (19.99 usr + 0.00 sys= 19.99 CPU) @ 30015.01/s (n=600000)
time() - timethis 600000: 67 wallclock secs (66.73 usr + 0.00 sys = 66.73 CPU) @ 8991.46/s (n=600000)
Linux:
Time::HiRes::time() - timethis 600000: 3 wallclock secs ( 1.22 usr + 0.24 sys = 1.46 CPU)
time() - timethis 600000: 1 wallclock secs ( 0.27 usr + 0.17 sys = 0.44 CPU)
Anyway, enough of the thread hijacking.
The modulo operator is a great idea, good suggestion.
| [reply] [Watch: Dir/Any] |
Re: method of ID'ing
by kappa (Chaplain) on Apr 16, 2002 at 15:03 UTC
|
| [reply] [Watch: Dir/Any] |