Re^6: IO::Socket::SSL / Net::SSLeay inefficient in non-blocking mode ?

Well that's pretty bizarre! So for some reason, attempting to read an incomplete SSL packet takes 3x as long as reading and decoding a packet? Your results point heavily toward a bad design inside libssl itself.

Silly me, sorry I guess I was really tired yesterday... I made a stupid copy-paste mistake so that the "Avg productive sysread time" and "Avg unproductive sysread time" values were wrong for the non-blocking SSL samples. I fixed them... Though it doesn't change the fact that the unproductive sysreads take more time than they should and are responsible for the high CPU usage in my opinion.

The first SSL sample of the three I posted is interesting because there is about the same number of sysread calls by second as the non-SSL sample. It means that during this sample there was close to one (unproductive) sysread for each TCP packet received, as in the non-SSL case. This allows us to compare the time of the unproductive SSL sysread calls to that of the non-SSL sysread calls: 6.33 us versus 2.73 us, more than twice as long for a one-TCP-packet read.

Just to be sure though, what if you make a quick change to your SSLeay.xs to verify that the problem is in SSL_read...

I added your debug traces and I did let run my script for about one minute and a half, redirecting stderr to a file.

First, of course it impacted a bit the metrics produced by my script. For example I no longer have samples with very high sysread call rate and productive sysread ratio < 10%. Here is an example of sample (the most common one) produced by my script when using your patched SSLeay binary:

Interval: 1.00s
  Avg transfer speed: 117.55 MB/s
  Nb sysread calls: 52362
  Productive sysread size: 16384 B
  Productive sysread ratio: 13.70%
  Avg productive sysread time: 21.68 us
  Avg unproductive sysread time: 13.37 us
[download]

(it looks like the second sample I posted previously, with similar sysread call rate and productive sysread ratio, except the timings are a bit higher, which makes sense I guess)

And here is a sample of the output produced by your debug traces:

SSL_read(s, buf, 16384) => 16384  (9 us)
SSL_read(s, buf, 16384) => -1  (7 us)
SSL_read(s, buf, 16384) => -1  (1 us)
SSL_read(s, buf, 16384) => -1  (1 us)
SSL_read(s, buf, 16384) => -1  (2 us)
SSL_read(s, buf, 16384) => -1  (1 us)
SSL_read(s, buf, 16384) => -1  (1 us)
SSL_read(s, buf, 16384) => 16384  (10 us)
SSL_read(s, buf, 16384) => -1  (6 us)
SSL_read(s, buf, 16384) => -1  (2 us)
SSL_read(s, buf, 16384) => -1  (1 us)
SSL_read(s, buf, 16384) => -1  (1 us)
SSL_read(s, buf, 16384) => -1  (5 us)
SSL_read(s, buf, 16384) => -1  (6 us)
SSL_read(s, buf, 16384) => -1  (1 us)
SSL_read(s, buf, 16384) => 16384  (12 us)
SSL_read(s, buf, 16384) => -1  (5 us)
SSL_read(s, buf, 16384) => -1  (6 us)
SSL_read(s, buf, 16384) => -1  (2 us)
SSL_read(s, buf, 16384) => -1  (2 us)
SSL_read(s, buf, 16384) => -1  (5 us)
SSL_read(s, buf, 16384) => 16384  (12 us)
SSL_read(s, buf, 16384) => -1  (6 us)
SSL_read(s, buf, 16384) => -1  (3 us)
SSL_read(s, buf, 16384) => -1  (1 us)
SSL_read(s, buf, 16384) => -1  (1 us)
SSL_read(s, buf, 16384) => -1  (5 us)
SSL_read(s, buf, 16384) => -1  (1 us)
SSL_read(s, buf, 16384) => 16384  (12 us)
SSL_read(s, buf, 16384) => -1  (6 us)
SSL_read(s, buf, 16384) => -1  (2 us)
SSL_read(s, buf, 16384) => -1  (2 us)
SSL_read(s, buf, 16384) => -1  (2 us)
SSL_read(s, buf, 16384) => -1  (1 us)
SSL_read(s, buf, 16384) => -1  (1 us)
[download]

We can see there is about 6 unproductive reads for 1 productive read at the time the sample was taken. This is the most common pattern from my observation. I found some extreme cases like these ones, but they are very rare:

SSL_read(s, buf, 16384) => 16384  (12 us)
SSL_read(s, buf, 16384) => -1  (5 us)
SSL_read(s, buf, 16384) => -1  (92 us)
SSL_read(s, buf, 16384) => 16384  (14 us)
...
SSL_read(s, buf, 16384) => 16384  (9 us)
SSL_read(s, buf, 16384) => -1  (5 us)
SSL_read(s, buf, 16384) => -1  (85 us)
SSL_read(s, buf, 16384) => 16384  (13 us)
...
SSL_read(s, buf, 16384) => 16384  (18 us)
SSL_read(s, buf, 16384) => -1  (75 us)
SSL_read(s, buf, 16384) => 16384  (24 us)
[download]

Some poor man's metrics regarding the recorded data:

$ grep -c "=> 16384" debug_ssl2.txt
613635
$ grep -c "=> \-1" debug_ssl2.txt
3779378
[download]

61365 productive reads
3779378 unproductive reads
=> ~13.97% productive sysread ratio

That is consistent with what my script is reporting (phew!)

What is NOT consistent though, is the timing for the sysread, which would confirm the problem doesn't come from libssl itself...

Comment on Re^6: IO::Socket::SSL / Net::SSLeay inefficient in non-blocking mode ? Select or Download Code


Keep It Simple, Stupid
	PerlMonks