confluentinc / kafka-connect-hdfs

Kafka Connect HDFS connector

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Explain limitation listed in the documentation

j-santander opened this issue · comments

I'm starting with Kafka Connect, Hadoop/HDFS and Kerberos... so, I'm probably missing some basic concepts.

In the published documentation (https://docs.confluent.io/kafka-connect-hdfs/current/overview.html#limitations), the following limitation is listed:

The HDFS 2 Sink connector does not permit you to run multiple instances of the connector in the same Kerberos environment.

It is unclear to me the meaning of this limitation. As written, my interpretation was that it is not possible to deploy on one Kafka Connect cluster (group of workers) more than one instance of the HDFS connectors if using Kerberos.

However, I've successfully created two working instances of the HDFS connector (both connected to the same kerberized HDFS and with the same connecting principal aka user), so I'm puzzled.

I guess there are many different pieces interacting here, so the limitation might lay out there....

We have:

  • Kafka Connect cluster: A set of nodes sharing the same configuration.
  • Kafka Connect Worker, node belonging to a cluster.
  • HDFS Sink Connector: An instance of the connector:
    • Deployed within a Kafka Connect cluster
    • Connecting with a Kerberos principal within a Kerberos realm.
    • Mapping a set of topics to an HDFS cluster (URL).
  • HDFS Cluster: A set of storage nodes.
    • Associated to a Kerberos realm.

So, as I said, what it is the use case that it is not possible?

Thanks very much in advance and please excuse is this a too basic question.

I believe the limitation is "multiple Connectors to different Hadoop environments within a single Connect worker" and its due to #325

Thanks,

Let me write it in my own words.

Within one Kafka-Connect worker is not possible to set up connectors that connect to different Hadoop clusters.

Setting up multiple connectors is possible, as long as all of them connect to the same cluster.

Is that accurate enough?

Thanks again.

I am new to this connector but last I know "Setting up multiple connectors is possible, as long as all of them connect to the same cluster and with the same Kerberos principal".