Connection leak or connections not closing/released/recycled when client app is down
balajivenki opened this issue · comments
Hi @shailesh33 and @ipapapa
I am running dynomite vdynomite-v0.6.11-dirty and my dynomite conf is
`
dyn_o_mite:
auto_eject_hosts: true
data_store: 0
datacenter: DC2
dyn_listen: 0.0.0.0:50555
dyn_seed_provider: simple_provider
dyn_seeds:
- dynomit02.*****.com:50555:rack02:DC2:858993459
- dynomit04.*****.com:50555:rack02:DC2:1717986918
- dynomit06.*****.com:50555:rack02:DC2:2576980377
- dynomit08.*****.com:50555:rack02:DC2:3435973836
- dynomit10.*****.com:50555:rack02:DC2:4294967295
- dynomit01.*****.com:50555:rack01:DC2:858993459
- dynomit03.*****.com:50555:rack01:DC2:1717986918
- dynomit07.*****.com:50555:rack01:DC2:3435973836
- dynomit09.*****.com:50555:rack01:DC2:4294967295
enable_gossip: false
env: network
listen: 0.0.0.0:50556
mbuf_size: 16384
pem_key_file: /path/dynomite.pem
rack: rack01
read_consistency: dc_one
secure_server_option: datacenter
server_failure_limit: 3
server_retry_timeout: 30000
servers: - 127.0.0.1:50180:1
timeout: 150000
tokens: 2576980377
write_consistency: dc_one
`
curl -k localhost:22222/info | grep client_connections shows 60k connections made to the node and when I did netstat found a particular node(which is a docker swarm node running multiple containers of apps using dynomite) having 29k established connections. Within the app when we checked they have very few connections made to the server.
So we brought down the swarm node and moved all containers to the other worker node. Now we expect dynomite server should recycle and release those connections. However it is not, the actual swarm worker node is down and all containers are moved to other workers but dynomite still holds on to the connections made when it was alive around 29k.
This causes problem when the connection exceeds 100k and fd limit exceeds eventually crashing the nodes.
Is there any setting we are missing or is there an known issue with connection recycling with dynomite?
netstat -antp | grep :50556 | grep ESTABLISHED | wc -l shows 60k connections and among that 29k are from the swarm node we stopped long time ago. We have to restart the redis/dynomite node to go back to 0 connections.
Please help.
We also found that the connections made from client doesnt have keepalive and it is off. That makes sense why the connections are not closed on the server side. Is it a bug ? Typically the client connections should have keepalive right? Please suggest if I am missing anything here.
#692 is merged. Closing this.