kafka-ops / julie

A solution to help you build automation and gitops in your Apache Kafka deployments. The Kafka gitops!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Connector is not being created if it was previously deleted (externally to JulieOps and using internal JulieOps state, topology.state.cluster.enabled=false)

vascoferraz opened this issue · comments

Describe the bug
Connector is not created if it was previously deleted.
However, if the name of the connector is changed it is created.

To Reproduce
Steps to reproduce the behavior:
1 - Run JulieOps to create a connector.
2 - Delete de connector by using the Control Center or by using the REST API.
3 - Confirm that the connector no longer exists.
4 - Run JulieOps again and the connector is not created (no exceptions are thrown when running JulieOps).

Expected behavior
The connector should be recreated.

Runtime

  • OS: Linux 4.18.0-348.7.1.el8_5.x86_64
  • JVM version: openjdk 11.0.13
  • JulieOps version: 4.1.1

topology file:

context: "xxx"
source: "yyy"
projects:
- name: "zzz"
  connectors:
    artifacts:
      - path: "/opt/julie-ops/SnmpTrapSourceConnectorConnectorTest.json"
        server: "connect1"
        name: "SnmpTrapSourceConnectorConnectorTest"
    access_control:
      - principal: "User:admin"
  topics:
    - name: "SnmpTrapSourceConnectorConnectorTest"
      config:
      replication.factor: "2"
      num.partitions: "3"

config file:

bootstrap.servers=host1:port, host2:port, host3:port
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="user" password="pass";
security.protocol=SASL_SSL sasl.mechanism=PLAIN ssl.truststore.location=/opt/julie-ops/client-truststore.jks
ssl.truststore.password=pass
schema.registry.url=https://sr-host:port
schema.registry.basic.auth.credentials.source=USER_INFO schema.registry.basic.auth.user.info=user:pass 

platform.servers.connect.0=connect0:https://host1:port
platform.servers.basic.auth.0=connect0@user:pass
platform.servers.connect.1=connect1:https://host1:port
platform.servers.basic.auth.1=connect1@user:pass
platform.servers.connect.2=connect2:https://host1:port
platform.servers.basic.auth.2=connect2@user:pass

connector config

{
  "name": "SnmpTrapSourceConnectorConnectorTest",
  "config": {
       "connector.class": "io.confluent.connect.snmp.SnmpTrapSourceConnector",
       "kafka.topic": "snmp-kafka-topic",
       "snmp.v3.enabled": "false"
  }
}

Hi @vascoferraz,
this is actually very true. What I understand from your report,

1.- You created a connector using Julieops.
2.- Deleted the connector externally.
3.- Then run JulieOps again.

in this case the internal state still tells JulieOps the connector is there, you can see in any of the internal states as used.

As you can see from https://github.com/kafka-ops/julie/blob/master/src/main/java/com/purbon/kafka/topology/KafkaConnectArtefactManager.java#L34

there are two options, get the state from the cluster directly or from the internal state representation.

From what I can see from your property file, the property topology.state.cluster.enabled is used as default, by default this would be false, see https://github.com/kafka-ops/julie/blob/master/src/main/resources/reference.conf#L52 .... so your internal state file literally no way of knowing the change you did externally.

There might be certain ways to make this part smarter, but in the meanwhile, if this is a possibility (that certainly should not be the case), to:

  • Configure topology.state.cluster.enabled=true, so the state is directly fetched from the cluster directly.

Let me know if this does not work for you.

-- Pere

@purbon, thank you so much for your reply.

By configuring topology.state.cluster.enabled=true I managed to redeploy a connector which was previously deleted by the REST API or the Control Center.

Deleting the file .cluster-state also worked.

However, the must important thing here was understanding how JulieOps works under the hood.

Thanks.

Hi @vascoferraz

By configuring topology.state.cluster.enabled=true I managed to redeploy a connector which was previously deleted by the REST API or the Control Center.

Happy this solved your situation for you, the main reason is that state was recovered directly from the target cluster.

Now looking at the future,
if you had a dream, how would you imagine JulieOps as best handling this internal situation?

Would a WARN or Exception kind of error be good? then you fix it manually? or refresh the internal state?

What do you think?

@purbon , I would say that a at least warning would be great and referring the property topology.state.cluster.enabled and how it could help understand what is going on so it would be easier to fix it manually.

Refreshing the internal state might be a "knife with two blades".

So, maybe a good approach is to inform the user: "The internal state of JulieOps does not match the current server settings" and then list those differences.

I would probably start by questioning the local cluster state file. We should assume connectors can and will be removed by other tools, and connectors do keep state in kafka, so the default should be going to the cluster for the information. We should also assume we will often run julieops from a container, which means this local file will get deleted anyways.

I imagine keeping a local state speeds things up with julieops somehow? - I think we should offer the option to use a local state - with the warnings you should not manage connectors outside of julieops if you do so. Checking the cluster should be the default instead.

Alternatively (or in addition), maybe you could have a fallback mechanism - if the state doesn't match, then fetch from the cluster and update your local cache.

With the addition of #478 this issue would be somehow now under more control. I'm closing this issue for now, feel free to reopen if necessary.

Thanks a lot everyone for your comments and contributions here! This feature will be included in upcoming versions.