Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re^4: Randomness encountered with CGI Session

by afoken (Chancellor)
on Jun 13, 2010 at 08:09 UTC ( [id://844411]=note: print w/replies, xml ) Need Help??

in reply to Re^3: Randomness encountered with CGI Session
in thread Randomness encountered with CGI Session

CGI::Session with default options uses pseudorandom numbers to generate a unique ID. The actual code from v4.42 in CGI::Session::ID::md5 is ...

my $md5 = new Digest::MD5(); $md5->add($$ , time() , rand(time) ); return $md5->hexdigest();

... and that looks just wrong. The process ID in $$ is not very random, neither is the current time from time(). And limiting the output of rand() to the value of time does not make any sense. Perhaps this was meant to initialise the pseudo-random number generator with the current time, but it doesn't, and that would be a bad idea. Using MD5 reduces the entropy even further.

There is no piece of code in that ID generator that guarantees that the returned ID is really a unique identifier, actually the three lines shown above are the only relevant code in the CGI::Session::ID::md5 module.

The code can generate colliding IDs, but I have no idea how likely that is. One should be able to calculate how many of the 2128 possible MD5 values can be generated with this code. $$ is probably defined as 32 bit integer, with the topmost 16 or 17 Bit being constantly 0 on most systems, effectively using 15 bits, time() uses 31 bits (32 bit signed integer defined to be >=0) with most of the the most significant bits being constant for years, and rand() returns a double - 64 Bit. So the input to MD5 is just 15+31+64=110 bits for the entire ctime definition from 1970 to 2038. For one day, time() changes only 86400 times, so time() gives only 17 varying bits, not 31. MD5 input is thus only 15+17+64=96 bits. Unlike the pseudo-random number generator, which also depends on its previous state, MD5 depends only on its input. Given 296 different input values, MD5 can generate at most 296 different output values. Due to the hashing, there will actually be fewer distinct output values. For one minute, time() changes 60 times, giving just 6 varying bits, MD5 input will be reduced to 15+6+64=85 bits, still assuming that the PID changes wildly with each request , and rand() returns real white noise. For one second, during which modern hardware can still process more than 109 instructions, time changes just one bit, so MD5 input is 80 bits. The process ID is usually incremented until it hits an arbitary upper limit defined in the OS kernel, then it starts again at the lowest free ID. For a sequence of requests to a webserver running a CGI for each request, it is quite safe to assume that the PID will increment by one or a very low number most of the times. So PID doesn't give 15 bits, but just one, two or perhaps four bits. MD5 input for one second is thus not more than 4+1+64=69 bits.

Now for rand(). I assumed 64 bits, i.e. sizeof(double). But that may be wrong. I don't know how rand() is implemented, and I'm too lazy to search. Assuming 32 bit integer arithmetics, rand() will perhaps generate only 232 different values, further reducing MD5 input to 4+1+32=37 bits. With Strawberry Perl 5.10.0, the actual number of bits (perl -V:randbits) is just 15, while perl 5.10.0 on a 32 bit Slackware 13.0 gives 48 bits. So, with that Strawberry Perl, MD5 input would be just 20 bits.

Note that the ID does not depend at all on the incoming request. It only depends on the process ID (which is typically a regularily overflowing unsigned integer), the current time, and the state of the pseudo-random number generator.

CGI::Session::ID::incr uses a flock()ed file to return a constantly increased ID, using this code:

my $IDFile = $args->{IDFile} or croak "Don't know where to store t +he id"; my $IDIncr = $args->{IDIncr} || 1; my $IDInit = $args->{IDInit} || 0; sysopen(FH, $IDFile, O_RDWR|O_CREAT, 0666) or return $self->set_er +ror("Couldn't open IDFile=>$IDFile: $!"); flock(FH, LOCK_EX) or return $self->set_error("Couldn't lock IDFil +e=>$IDFile: $!"); my $ID = <FH> || $IDInit; seek(FH, 0, 0) or return $self->set_error("Couldn't seek IDFile=>$ +IDFile: $!"); truncate(FH, 0) or return $self->set_error("Couldn't truncate IDFi +le=>$IDFile: $!"); $ID += $IDIncr; print FH $ID; close(FH) or return $self->set_error("Couldn't close IDFile=>$IDFi +le: $!"); return $ID;

Note that the caller is responsible for error checking here, but that's not the problem. The IDs are easily guessable, making the script vulnerable. As a side note, the code should not use FH, but my $fh instead, so that the file is properly unlocked and closed when an error occurs.

CGI::Session::ID::static returns a constant ID supplied by the caller. The documentation shows only one example, $session = new CGI::Session("id:static", $ENV{REMOTE_ADDR});. This is even worse outside a controlled network. Many users work behind proxies, so you may have several thousand users with the same REMOTE_ADDR (of the proxy). AOL once hat a set of proxies configured in a way that each new request goes through a random proxy, so the REMOTE_ADDR was not constant for a user, and it was shared by all AOL users. I don't know if this setup is still in use, and I don't care.

From what I saw studying the CGI::Session code, there is no secure and reliable ID generator available. The documentation has absolutely no information about how secure and how reliable each of the generators is, and which generator to use in which situation. That's very sad.

A quick look at the storage drivers CGI::Session::Driver::DBI (base class), CGI::Session::Driver::mysql, CGI::Session::PostgreSQL, and CGI::Session::Driver::sqlite shows another nasty surprise: While all of those databases provide at least one way to generate a reliable, unique ID, CGI::Session does not use this advantage. Instead, it relies on the problematic ID generators shown above.


Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Replies are listed 'Best First'.
Re^5: Randomness encountered with CGI Session
by wfsp (Abbot) on Jun 13, 2010 at 09:54 UTC
    Well, that's certainly food for thought and has shaken my complacancy.

    Many thanks for the detailed reply.

Re^5: Randomness encountered with CGI Session
by Anonymous Monk on Jun 13, 2010 at 09:25 UTC
      It looks like the author copied the idea from Apache::Session::Generate::MD5.

      Perhaps the idea, but neither algorithm nor source. Apache::Session::Generate::MD5 uses substr(Digest::MD5::md5_hex(Digest::MD5::md5_hex(time(). {}. rand(). $$)), 0, $length), with $length initialised to 32. It has the same problems with time(), $$, and rand(). Due to the use of the concat operator, rand() returns a string, where most of the bits are constant (0-9 differ only in the last four bits), but it returns a lot more bits. This difference should not really matter for MD5 hashing, rand() will give about 2RANDBITS different values, perhaps only 2RANDBITS-1 due to runding. (Ab-)using the address of an anonymous reference as another entropy source is a nice idea, but how does perl (and the OS) randomize the address? Running perl -e 'print "".{}' on my Strawberry installation returns ZERO random bits, the value is constantly HASH(0x3f9b9c). On Slackware 13.0, I see differnt values, perl -e 'system $^X,-E=>q[say "".{}] for 1..1000'|sort -u|wc -l gives 936. Not too bad. But from where comes the entropy used to randomize the address? From the same source used for rand()? That would be pretty bad.

      Because md5_hex() always returns 32 chars, substr is pretty useless. But the surrounding code may reduce $length, making colliding IDs more probably.

      CGI::Session::ID::uuid appears to use better algorithms.

      At least, there are short comments in the code about the external UUID generators used. Too bad they aren't shown in the documentation.

      Using time-based UUIDs (v1 and v2) gives a new, unique ID every 100 ns, that should be sufficient for a session ID. <update>Of course, most bits of those UUIDs can be guessed by an attacker, so using them directly as a session ID would be a bad idea.</update> The other UUID variants are either constant (name-based, v3 and v5) or depend on a random number generator (v4). When that generator is a pseudo-random number generator, the quality of the UUID depends on the quality of the pseudo-random number generator implementation.


      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://844411]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2024-04-16 19:35 GMT
Find Nodes?
    Voting Booth?

    No recent polls found