Client connections not properly closed?

Question

Client connections not properly closed?

hannesm opened this issue 3 years ago · comments

We at robur (//cc @reynir) run a dream (alpha2) web application on the public Internet (behind a TLS reverse proxy) -- https://builds.robur.coop

Now, we observe after several days where lots of client have been served (and some have errored since the connection was preliminary closed/reset) there re lots of sockets owned by the dream process that are in the CLOSED state according to netstat.

May it be the case that there are code paths that do not call {Unix,Lwt_unix}.close on client sockets, and thus exhibit file descriptor leakages? May this be related to #118? While I suspect that cleanly terminating connection are doing the right thing [tm] (and call close()), may there be error cases where close is not called? I find the code rather hard to follow, but eventually you have an idea where the root cause is located?

Anton Bachin · Answer 1 · Wed Nov 17 2021 20:59:52 GMT+0800 (China Standard Time)

I'll have to reproduce it. I'll check whether the Dream playground web server has the same issue.

As a note, Dream itself never closes sockets directly. This is done by the underlying web server, http/af. The issue could be either in http/af itself, or in how Dream interacts with it.

Hannes Mehnert · Answer 2 · Wed Nov 17 2021 21:05:06 GMT+0800 (China Standard Time)

thanks for confirming. as a user of dream we also don't have to close anything specifically, or do we?

I suspect normal connections (curl etc.) are handled nicely, but maybe disappearing clients (or those not sending Connection: close but nevertheless closing the TCP connection) aren't gracefully closing the fd, eventually?

Anton Bachin · Answer 3 · Wed Nov 17 2021 21:16:35 GMT+0800 (China Standard Time)

thanks for confirming. as a user of dream we also don't have to close anything specifically, or do we?

Ideally, Dream should take care of as many such things as it can automatically. I can't say much more without knowing which functions you're calling and how. But ideally, it shouldn't be easy to cause an fd leak.

I suspect normal connections (curl etc.) are handled nicely, but maybe disappearing clients (or those not sending Connection: close but nevertheless closing the TCP connection) aren't gracefully closing the fd, eventually?

I don't know at this point.

By the way, is this causing a problem besides the accumulation of fds? Is the process running out of them well before you might otherwise restart it, for example?

Hannes Mehnert · Answer 4 · Wed Nov 17 2021 21:16:52 GMT+0800 (China Standard Time)

FWIW, a nginx server behind the same TLS reverse proxy does not have any sockets in state CLOSED.

Hannes Mehnert · Answer 5 · Wed Nov 17 2021 21:20:10 GMT+0800 (China Standard Time)

I can't say much more without knowing which functions you're calling and how. But ideally, it shouldn't be easy to cause an fd leak.

Code is at https://github.com/roburio/builder-web :)

bin/builder_web_app.ml

    Dream.run ~port ~interface:host ~https:false
    @@ Dream.logger
    @@ Dream.sql_pool ("sqlite3:" ^ dbpath)
    @@ Http_status_metrics.handle
    @@ Builder_web.add_routes datadir
    @@ Dream.not_found

and inside the handlers (lib/builder_web.ml):

Dream.respond / Dream.redirect / Dream.empty

By the way, is this causing a problem besides the accumulation of fds? Is the process running out of them well before you might otherwise restart it, for example?

Yes. This is the issue.

Anton Bachin · Answer 6 · Wed Nov 17 2021 21:54:29 GMT+0800 (China Standard Time)

Thanks. Do you know if the fds that remain open might be related to any events you observe in the logs?

Hannes Mehnert · Answer 7 · Wed Nov 17 2021 22:05:14 GMT+0800 (China Standard Time)

I do not know that unfortunately. Maybe it would be a good idea to include the file descriptor number in the log output, so a correlation would be possible?