dashbitco / broadway_cloud_pub_sub

A Broadway producer for Google Cloud Pub/Sub

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Maintain a connection pool for HTTP requests

mcrumm opened this issue · comments

This is an issue for further discussion of the HTTP connection pool settings

"...Also, I believe amazon sqs also defaults to hackney, so maybe we can discuss a unified configuration/api for passing down options to the http client. It seems another thing we may want to do is to always configure the http client pool to have at least the number of producers + number of processors. Although I have seen some people setting up thousands of processors, so we may want a cap on that. maybe number of producers + 2 * number of cores?"

Originally posted by @josevalim in #18 (comment)

As a producer, or, in this case, as the client being initialized by a producer, what information would I need from Broadway to determine an appropriate pool size?

Producer stages, batch sizes, and min/max demand all seem like they could be relevant here, and that leads me to believe that maybe Broadway really should be telling the producer what options to use.

For Cloud Pub/Sub, we definitely need one connection per producer for pulling messages. Then, ideally, we always have a connection available when we need to acknowledge, so that those requests aren't getting queued. I'm not familiar enough with how Broadway determines when to make an ack/3 call to say how many connections would be required given a particular pipeline configuration, so maybe 2 * number of cores is the safest option there.

@msaraiva, I believe we have been progressively giving more options to the producer so it can configure or learn more about the rest of the pipeline. Do you have any suggestions here?

Currently, if we need to pass any information about the rest of the pipeline configuration, we need to change Broadway.Server manually to pass that information. I had to do this for the Kafka producer that requires the number of processors so we can provide a hash function for partitioning. Maybe we should consider passing the whole configuration to all of them (producers, processors, batchers, ...) and then, they decide which information they need.

@msaraiva ok, if you need it for kafka and you need it here, then we will have to change Broadway to pass it generally indeed.

I don't think we need to worry about processors/batchers though since they are always controlled by Broadway. It is only complicated for producers because they have a public API.