boazsegev / iodine

iodine - HTTP / WebSockets Server for Ruby with Pub/Sub support

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unstable TLS connection

fgi opened this issue · comments

Hello @boazsegev,

Here is an issue related to TLS connections, I suppose lost or reset state while in use.

System Information

  • OS: FreeBSD 12.1-RELEASE-p3 GENERIC amd64
  • Ruby: 2.7.1p83
  • Version: 0.7.40
  • OpenSSL: 1.1.1d-freebsd 10 Sep 2019

Description

I launch Iodine with following command in a FreeBSD service (run as root):
$GEMDIR/iodine -b X.X.X.X -p 443 -w 1 -tls-cert $CERTDIR/cert.pem -tls-key $CERTDIR/privkey.pem -public $DIR/public -v 2>$DIR/log/iodine.log -verbosity 5 &

Observed behaviors explained here could not be reproduced in HTTP, without TLS.

Behavior with static files

Static path is set at Iodine launch, Ruby should not be involved in static files delivery.

On some static file requests, the TLS connection fails to finish sending data and the process overloads CPU. The connection seems to stop working and is closed with a TLS cleanup after about 1min 37sec.

Example:
First try to load 2.32db9452.chunk.js file, all seems to be okay.

DEBUG (fio_tls_openssl.c:868): Attaching TLS read/write hook for 0x100c (server mode).
DEBUG (fio_tls_openssl.c:345): TLS ALPN set to: http/1.1 for 0x100c
DEBUG (fio_tls_openssl.c:767): Completed TLS handshake for 0x100c
 - - [Thu, 18 Jun 2020 13:16:57 GMT] "GET /webapp/static/js/2.32db9452.chunk.js HTTP/1.1" 200 200736b 0ms
 - - [Thu, 18 Jun 2020 13:16:57 GMT] "GET /favicon.ico HTTP/1.1" 200 15406b 249ms
DEBUG (fio_tls_openssl.c:647): TLS cleanup for 0x110a
DEBUG (fio.c:327): FD 17 re-initialized (state: 0x110b-closed).
DEBUG (fio_tls_openssl.c:647): TLS cleanup for 0xf0e
DEBUG (fio.c:327): FD 15 re-initialized (state: 0xf0f-closed).

A couple more reloads work as well.

 - - [Thu, 18 Jun 2020 13:17:13 GMT] "GET /webapp/static/js/2.32db9452.chunk.js HTTP/1.1" 200 200736b 443ms
 - - [Thu, 18 Jun 2020 13:17:14 GMT] "GET /favicon.ico HTTP/1.1" 200 15406b 80ms
 - - [Thu, 18 Jun 2020 13:17:17 GMT] "GET /webapp/static/js/2.32db9452.chunk.js HTTP/1.1" 200 200736b 237ms
 - - [Thu, 18 Jun 2020 13:17:17 GMT] "GET /favicon.ico HTTP/1.1" 200 15406b 215ms

One more time, 2.32db9452.chunk.js file delivery is stopped with about one half of data sent, the process overloads the CPU at 100% for about 25-30 seconds:

 - - [Thu, 18 Jun 2020 13:17:21 GMT] "GET /webapp/static/js/2.32db9452.chunk.js HTTP/1.1" 200 200736b 111ms
DEBUG (fio.c:327): FD 17 re-initialized (state: 0x110c-open).
DEBUG (fio_tls_openssl.c:868): Attaching TLS read/write hook for 0x110c (server mode).
DEBUG (fio_tls_openssl.c:345): TLS ALPN set to: http/1.1 for 0x110c
DEBUG (fio_tls_openssl.c:767): Completed TLS handshake for 0x110c

The log says that the file is delivered but it's incomplete.
The browser waits for the end of the file.

After 1 min and 37 sec, the connection seems to be reinitialized or closed:

 - - [Thu, 18 Jun 2020 13:17:22 GMT] "GET /favicon.ico HTTP/1.1" 200 15406b 0ms
DEBUG (fio_tls_openssl.c:647): TLS cleanup for 0x100c
DEBUG (fio.c:327): FD 16 re-initialized (state: 0x100d-closed).

The browser stops waiting, shows a HTTP code 200 but received file is not complete.

It seems to occur on files bigger than 100-150kb maybe. The one in the example is 200kb.
I could not reproduce it with smaller files, but I could with larger ones, like images.

Behavior with Websockets

I observed something similar with websockets, so I add it to this issue.
I open a websocket, exchange a small amount of data and suddenly or after about 15 minutes, the process starts running 100% of the CPU forever.
The previously opened websocket does not seem to respond.
It's hard to say what is going on in the websocket. I activated Ruby logs in all events but they show no error.
There is nothing in Iodine logs, the server is not crashed but is very slow until I restart it.

I hope this information is enough to analyse the issue.
Don't hesitate to ask if I can help.

Kindly,
Franck

commented

Thank you for opening this issue, @fgi.

I can't reproduce this on my MacOS. I tired downloading a 480Kb file multiple times and they were all successful (using curl).

If you upgrade to OpenSSL 1.1.1g, does the issue persist (that's the version I'm running against)?

Does this happen with a self-signed certificate? (iodine - tls)?

Sadly, I'm swamped and might no get to work on the issue for a while, but I'll do my best to speed things up as soon as I can reproduce the issue.

Kindly,
Bo.

P.S.

I tested using this Ruby script:

def download_file
   `curl --insecure https://localhost:3000/assets/logo/facil_logos_png.zip > tmp.zip`
end

while(true) 
  10.times { Thread.new() { download_file() }  }
  download_file()
  sleep(0.1)
end

I'm still looking. If I find something, I'll update this issue.

I upgraded OpenSSL to 1.1.1g and tried with a self-signed certificate.
Unfortunately, it's not better.

Kindly,
Franck

commented

I'm still can't seem to debug the issue, as I can't recreate it (and I'm also bogged down with other work).

The whole of the implementation concern for OpenSSL is in the Read-Write hooks coded in fio_tls_openssl.c.

The requirements for these read-write hooks and the API is detailed in the fio.h file.

I think the best approach would be to replace the OpenSSL socket API based implementation with an OpenSSL BIO API implementation (buffered IO).

If anyone comes across this issue and can contribute, that would be great. Otherwise it might take longer than I like.

B.

I did more testing based on your Ruby script. I can't recreate the issue with a script using curl or wget.
I can only do it reloading manually a browser, Firefox of Chrome.
I'm still investigating, focusing on file loading in the browser.

commented

Thanks for the update.

If this is a browser related issue, it might not have shown on my tests at all... does it occur with TLS disabled? Is it truly a TLS issue? It might be a pipelining issue 🤔

B.

It does not occur without TLS. Pipelining?

commented

It does not occur without TLS

Good to know.

Pipelining?

It's possible to send a large number of requests without waiting for a response. This is called pipelining.

When facil.io / iodine detect a pipelining client (single connection), they will reschedule the client after a few requests to prevent a DOS attack (allowing other client to perform work).

Unlike a slow client attack, where the throttling is reported to the debug log, pipelining might be a valid client and isn't reported (I should probably change that).

I thought maybe an error occurred somewhere around that point.

That was assuming that when you quickly refresh the page a number of times, the browser's network layer might collect a larger number of requests into its buffer, pipelining all these requests (sending them through the same connection without waiting for a response).

Thank you for the explanation about pipelining.

Finally, I could recreate the issue with curl and reduced it to a minimalistic configuration.
On this server port, only Iodine runs to serve static files. There is no config.ru, no Ruby app.
Iodine launch command:

iodine -b 0.0.0.0 -p 4434 -w 1 -t 1 -tls -public ./public -v 2>./log/iodine.log -verbosity 5 &

You can run following curl command 4-5 times to observe interrupted transfers:

curl -kv 'https://213.32.77.185:4434/chunk.js'

I added a symbolic link to Iodine log file in public directory. You should be able to see it with:

curl -kv 'https://213.32.77.185:4434/iodine.log'

Strangely, when I launch this curl request in a terminal directly on the server, the issue does not occur.
The request seems to have to be remote.
I tried to disable the firewall, it changed nothing.

I hope it helps.