Netflix / dynomite

A generic dynamo implementation for different k-v storage engines

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Connection leak or connections not closing/released/recycled when client app is down

balajivenki opened this issue · comments

Hi @shailesh33 and @ipapapa

I am running dynomite vdynomite-v0.6.11-dirty and my dynomite conf is

`
dyn_o_mite:
auto_eject_hosts: true
data_store: 0
datacenter: DC2
dyn_listen: 0.0.0.0:50555
dyn_seed_provider: simple_provider
dyn_seeds:

  • dynomit02.*****.com:50555:rack02:DC2:858993459
  • dynomit04.*****.com:50555:rack02:DC2:1717986918
  • dynomit06.*****.com:50555:rack02:DC2:2576980377
  • dynomit08.*****.com:50555:rack02:DC2:3435973836
  • dynomit10.*****.com:50555:rack02:DC2:4294967295
  • dynomit01.*****.com:50555:rack01:DC2:858993459
  • dynomit03.*****.com:50555:rack01:DC2:1717986918
  • dynomit07.*****.com:50555:rack01:DC2:3435973836
  • dynomit09.*****.com:50555:rack01:DC2:4294967295
    enable_gossip: false
    env: network
    listen: 0.0.0.0:50556
    mbuf_size: 16384
    pem_key_file: /path/dynomite.pem
    rack: rack01
    read_consistency: dc_one
    secure_server_option: datacenter
    server_failure_limit: 3
    server_retry_timeout: 30000
    servers:
  • 127.0.0.1:50180:1
    timeout: 150000
    tokens: 2576980377
    write_consistency: dc_one
    `

curl -k localhost:22222/info | grep client_connections shows 60k connections made to the node and when I did netstat found a particular node(which is a docker swarm node running multiple containers of apps using dynomite) having 29k established connections. Within the app when we checked they have very few connections made to the server.

So we brought down the swarm node and moved all containers to the other worker node. Now we expect dynomite server should recycle and release those connections. However it is not, the actual swarm worker node is down and all containers are moved to other workers but dynomite still holds on to the connections made when it was alive around 29k.

This causes problem when the connection exceeds 100k and fd limit exceeds eventually crashing the nodes.

Is there any setting we are missing or is there an known issue with connection recycling with dynomite?

netstat -antp | grep :50556 | grep ESTABLISHED | wc -l shows 60k connections and among that 29k are from the swarm node we stopped long time ago. We have to restart the redis/dynomite node to go back to 0 connections.

Please help.

We also found that the connections made from client doesnt have keepalive and it is off. That makes sense why the connections are not closed on the server side. Is it a bug ? Typically the client connections should have keepalive right? Please suggest if I am missing anything here.

#692 is merged. Closing this.