pipelinedb / pipeline_kafka

PipelineDB extension for Kafka support

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kafka SSL

derekjn opened this issue · comments

(Moved from @michaelsauter pipelinedb/pipelinedb#1827)

Is it possible to connect to Kafka via SSL?

http://docs.pipelinedb.com/integrations.html#apache-kafka does not mention anything so I assume it's not possible? That would be very handy though :)

@michaelsauter this should be possible, you'd just need to compile librdkafka with SSL support enabled, and then build pipeline_kafka against that version of the library. If you feel comfortable trying building these from source, we'd love to hear back from you regarding the results!

These things are fairly straightforward to build, but if you don't feel comfortable building things yourself, we can take a look for you when we get some free cycles. I think users would be interested in at least knowing how to build this support in to their pipeline_kafka deployment.

@derekjn Sorry for opening the issue in the wrong repo & thanks for your quick reply!

I can take a shot at building it with SSL support. However, how would I configure pipeline_kafka to use my certs?

@derekjn Sorry for opening the issue in the wrong repo & thanks for your quick reply!

@michaelsauter oh please, don't worry about it!

However, how would I configure pipeline_kafka to use my certs?

This is kind of unexplored territory for us so I'm not actually completely sure. I think the right place to start is understanding how to configure librdkafka for SSL connectivity.

And then pipeline_kafka accepts a list of key-value pairs to pass to the librdkafka client via pipeline_kafka.consumer_config.

So basically, here's what I think a fairly straightforward approach might be here:

  1. Get a librdkafka consumer working with SSL (independently of pipeline_kafka) to understand what the configuration looks like.
  2. Build librdkafka and pipeline_kafka with SSL support enabled.
  3. Pass this configuration determined by (1) to pipeline_kafka via pipeline_kafka.consumer_config.

Okay I tried to get this working but failed so far.

Here's what I tried:

  • For some background, I already have SSL setup on the brokers and use it in consumers which are based on ruby-kafka. The SSL setup there is based on 3 files, see client authentication for more details. This setup works.
  • Reading through librdkafka for SSL, it sounds like the only thing I need to do is the last part, "Configure librdkafka client", as the previous stuff is about creating the PEM files etc. which I already have.
  • With that assumption, I copied https://github.com/pipelinedb/pipelinedb/blob/master/pkg/docker/Dockerfile and replaced && ./configure --CFLAGS="-fPIC" --prefix=/usr --disable-ssl --disable-sasl \ with && ./configure --CFLAGS="-fPIC" --prefix=/usr \, and build a new image, pipelinedb-ssl.
  • I then used that Docker image to run a new container with docker run -v /dev/shm:/dev/shm -v $PWD/data:/var/lib/pipelinedb/data --name pipelinedb2 --net dev pipelinedb-ssl.
  • After that, I modified pipelinedb.conf to include pipeline_kafka.consumer_config = 'security.protocol=ssl,ssl.ca.location=ca-cert,ssl.certificate.location=client-cert,ssl.key.location=client-key,ssl.key.password=secret', and placed the certs in the data directory.
  • I started the container again, added a broker and tried to consume.
  • I got the following errors:
ERROR:  [pipeline_kafka] logs_stream <- logs (PID 45): failed to acquire metadata: Local: Timed out
LOG:  worker process: [kafka consumer] logs_stream <- logs (PID 45) exited with exit code 1

Now I'm unsure what to do next. Did I build librdkafka incorrectly? Is something wrong with how I linked to the certs?

Are there any other error logs I could inspect?

I verified that I can connect to the Kafka container correctly and consume topics over PLAIN.

@derekjn Any ideas what I could try to get this working? Would love to use PipelineDB but this is a show-stopper at the moment :/

@michaelsauter thanks for following up! I'll make some time to personally look into this over the next couple of days. I'll report back here soon, definitely this week.

@michaelsauter Quick update: we've reproduced the behavior you described so should have a resolution for you relatively soon. We also came across this issue in librdkafka while debugging:

confluentinc/librdkafka#1347

And it's possible that this issue is related to it. Anyways, we'll keep you posted!

@michaelsauter so this was most likely due to a bug causing the topic-level librdkafka configuration to be set by default, which caused your SSL settings to end up in an unexpected place. After fixing the bug (8525850), we were able to verify that pipeline_kafka works over SSL. After adding an SSL listener:

SELECT pipeline_kafka.add_broker('<ssl listener host>:<ssl listener port>');

We used the following consumer config:

pipeline_kafka.consumer_config='security.protocol=ssl,ssl.ca.location=ca-cert,ssl.certificate.location=client.pem,ssl.key=password'

Also, you may have used this already but one approach that's helpful with debugging here is to use the kafkacat tool to verify connectivity to Kafka since it's based off of librdkafka. e.g.,

kafkacat -C -b <broker host>:<ssl listener port> -t topic -Xsecurity.protocol=ssl Xssl.ca.location=ca-cert -Xssl.certificate.location=client.pem -Xssl.key.password=password

Once you're able to successfully connect via something like kafkacat, you should be able to transfer the settings to pipeline_kafka via pipeline_kafka.consumer_config. Please let us know if this works for you on master, and thanks for your patience here!

@derekjn Thank you so much for looking into this! Awesome.

Unfortunately, I can't get to it in the next days, but will try it again next week and let you know if it worked.

Also thanks a lot for the tip regarding kafkacat, I haven't used it yet.

@derekjn Actually found some time to sneak this in today, I just had to try it out :) And it worked!

I used exactly what I described above.

@michaelsauter great to hear, our apologies for taking so long on this one. Let us know if you have any further issues here or with PipelineDB!