crate / crate-jdbc

A JDBC driver for CrateDB.

Home Page:https://crate.io/docs/jdbc/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CrateConnection creates a lot of background threads

bin01 opened this issue · comments

commented

We are noticing that for every CrateConnection object created, there are close to 21 background Elasticsearch threads that are created by CrateClient. Following is the breakdown of type of threads created:

  • transport_client_worker: 2 * num cores = 16
  • transport_client_boss: 1
  • transport_client_timer: 1
  • generic: 1
  • scheduler: 1
  • timer: 1

This overhead wouldn't have been a problem if a single CrateConnection is shared across many clients. But given that CrateConnection is technically a JDBC Connection, it shouldn't be allowed for sharing, at least per the interface specification.

To reuse the CrateConnection we are currently creating a pool of 10 CrateConnection objects which are resulting in creation of close to 210 threads. Although we haven't run into many performance issues yet, we expect this would be a problem if we create this on systems with low resources.

To reduce this count, we could set the "transport.netty.worker_count" to 1 or some reasonable number which will reduce the over head to 6 threads per CrateConnection. Although it is still not ideal, it is reduces the thread usage by 70%. Is this solution any good? Does it have any side effects?

Another non-ideal solution is to make the CrateClient a singleton and share the same CrateClient whenever a CrateConnection is created for the same server url's.

Hi @bin01,
you could set the "transport.netty.worker_count" to 1. We will also optimize the default setting.
Basically this should scale with the capacity of your machine: less cores, less threads. We would also recommend to scale the number of CrateConnection Objects with the system resources.
Most of the threads just idle anyway and therefor consume few resources.
Is that okay, to you? Please tell us, when you hit any limits.
Best, Johannes

commented

@joemoe I agree that most of the threads don't do anything. But if a CrateConnection is created which is a JDBCConnection it is technically not supposed to share the connection right. If that is true, then why do we need more than one thread in the context of CrateConnection? Does crate make parallel requests against many nodes in the cluster that requires many threads?

@bin01, it is possible to reuse a Crate connection object, we are checking if this is possible in the the JDBC environment. We create several threads to keep the cluster synced. It doesn't necessarily mean there are parallel requests happening.

Hi @bin01, we don't think it would be a problem to reuse the connection as we are also doing this internally. But we don't know much about your implementation, so maybe give it a try.

commented

@joemoe Not sure if I made myself clear. The question is not if we can reuse the connection (which we are already doing), it is about sharing a CrateConnection.

@bin01 we are not sure what you mean by sharing? between what? multiple threads?
maybe we setup a call if you want. write me an email johannes [at] crate.io.