Aiven-Open / tiered-storage-for-apache-kafka

RemoteStorageManager for Apache Kafka® Tiered Storage

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tiered Storage functionality is disabled in the broker. Topic cannot be configured with remote log storage.

Gamote opened this issue · comments

Hello everyone, first of all thank you so much for all your work that you put into this plugin. It is very helpful to the community!

What can we help you with?

For almost a week we are trying to activate the Tiered Storage for our Kafka cluster but we are encountered by the following error when we are trying to activate it for a topic:

Error while executing config command with args '--command-config /tmp/kafka-client.properties --bootstrap-server localhost:9092 --alter --entity-type topics --entity-name kafkaconnect.product-api.products --add-config remote.storage.enable=true, local.retention.bytes=10737418240'

java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.InvalidConfigurationException: Tiered Storage functionality is disabled in the broker. Topic cannot be configured with remote log storage.
        at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
        at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
        at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:180)
        at kafka.admin.ConfigCommand$.alterConfig(ConfigCommand.scala:374)
        at kafka.admin.ConfigCommand$.processCommand(ConfigCommand.scala:341)
        at kafka.admin.ConfigCommand$.main(ConfigCommand.scala:97)
        at kafka.admin.ConfigCommand.main(ConfigCommand.scala)
Caused by: org.apache.kafka.common.errors.InvalidConfigurationException: Tiered Storage functionality is disabled in the broker. Topic cannot be configured with remote log storage.

We have created a custom docker image based on the Bitnami Kafka one, and our config look like this:

# ----- Add SASL/SCRAM configuration -----

listener.security.protocol.map=CLIENT:SASL_PLAINTEXT,INTERNAL:PLAINTEXT,CONTROLLER:SASL_PLAINTEXT,EXTERNAL:SASL_PLAINTEXT
sasl.mechanism.inter.broker.protocol=PLAINTEXT

# ----- Increase Tiered Storage log level -----

log4j.logger.org.apache.kafka.server.log.remote.storage=DEBUG
log4j.logger.org.apache.kafka.server.log.remote.metadata.storage=DEBUG
log4j.logger.kafka.log.remote=DEBUG
log4j.logger.org.apache.kafka.clients.admin=DEBUG
log4j.logger.org.apache.kafka.common.network=DEBUG
log4j.logger.org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager=DEBUG
log4j.logger.io.aiven.kafka.tieredstorage=DEBUG
log4j.logger.io.aiven.kafka.tieredstorage.storage=DEBUG

# ----- Enable tiered storage -----

remote.log.storage.system.enable=true

# ----- Configure the remote log manager -----

# This is the default, but adding it for explicitness:
remote.log.metadata.manager.class.name=org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager

# Put the real listener name you'd like to use here:
remote.log.metadata.manager.listener.name=INTERNAL

# ----- Configure the remote storage manager -----

# Here you need either one or two directories depending on what you did in Step 1:
remote.log.storage.manager.class.path=/opt/bitnami/kafka/plugins/tiered-storage-core/*:/opt/bitnami/kafka/plugins/tiered-storage-gcs/*
remote.log.storage.manager.class.name=io.aiven.kafka.tieredstorage.RemoteStorageManager

# 4 MiB is the current recommended chunk size:
rsm.config.chunk.size=4194304

# ----- Configure the storage backend -----

# Using GCS as an example:
rsm.config.storage.backend.class=io.aiven.kafka.tieredstorage.storage.gcs.GcsStorage
rsm.config.storage.gcs.bucket.name=gowish-staging-kafka-tiered-storage
rsm.config.storage.gcs.credentials.default=true
# The prefix can be skipped:
#rsm.config.storage.key.prefix: "tiered-storage/"

# ----- Configure the fetch chunk cache -----

rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
rsm.config.fetch.chunk.cache.path=/bitnami/kafka/tiered-storage-cache
# Pick some cache size, 16 GiB here:
rsm.config.fetch.chunk.cache.size=17179869184
# Prefetching size, 16 MiB here:
rsm.config.fetch.chunk.cache.prefetch.max.size=16777216
# Cache retention time ms, where -1 represents infinite retention
rsm.config.fetch.chunk.cache.retention.ms=600000
This is our full **server.properties**
# Listeners configuration
listeners=CLIENT://:9092,INTERNAL://:9094,EXTERNAL://:9095
listener.security.protocol.map=CLIENT:SASL_PLAINTEXT,INTERNAL:SASL_PLAINTEXT,CONTROLLER:SASL_PLAINTEXT,EXTERNAL:SASL_PLAINTEXT
advertised.listeners=CLIENT://advertised-address-placeholder:9092,INTERNAL://advertised-address-placeholder:9094
# KRaft node role
process.roles=broker
#node.id=
controller.listener.names=CONTROLLER
controller.quorum.voters=1000@kafka-staging-controller-0.kafka-staging-controller-headless.kafka-staging.svc.cluster.local:9093,1001@kafka-staging-controller-1.kafka-staging-controller-headless.kafka-staging.svc.cluster.local:9093,1002@kafka-staging-controller-2.kafka-staging-controller-headless.kafka-staging.svc.cluster.local:9093
# Kraft Controller listener SASL settings
sasl.mechanism.controller.protocol=PLAIN
listener.name.controller.sasl.enabled.mechanisms=PLAIN
listener.name.controller.plain.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="controller_user" password="controller-password-placeholder" user_controller_user="controller-password-placeholder";
# Kafka data logs directory
log.dir=/bitnami/kafka/data
# Kafka application logs directory
logs.dir=/opt/bitnami/kafka/logs

# Common Kafka Configuration

sasl.enabled.mechanisms=PLAIN,SCRAM-SHA-256,SCRAM-SHA-512
# Interbroker configuration
inter.broker.listener.name=INTERNAL
sasl.mechanism.inter.broker.protocol=PLAIN
# Listeners SASL JAAS configuration
listener.name.client.plain.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required user_kafka_ui_staging="password-placeholder-0" user_gowish_api="password-placeholder-1" user_gowish_product_api="password-placeholder-2" user_data_warehouse="password-placeholder-3";
listener.name.client.scram-sha-256.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required;
listener.name.client.scram-sha-512.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required;
listener.name.internal.plain.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="inter_broker_user" password="interbroker-password-placeholder" user_inter_broker_user="interbroker-password-placeholder" user_kafka_ui_staging="password-placeholder-0" user_gowish_api="password-placeholder-1" user_gowish_product_api="password-placeholder-2" user_data_warehouse="password-placeholder-3";
listener.name.internal.scram-sha-256.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="inter_broker_user" password="interbroker-password-placeholder";
listener.name.internal.scram-sha-512.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="inter_broker_user" password="interbroker-password-placeholder";
listener.name.external.plain.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required user_kafka_ui_staging="password-placeholder-0" user_gowish_api="password-placeholder-1" user_gowish_product_api="password-placeholder-2" user_data_warehouse="password-placeholder-3";
listener.name.external.scram-sha-256.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required;
listener.name.external.scram-sha-512.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required;
# End of SASL JAAS configuration

# Custom Kafka Configuration

# Custom broker config
delete.topic.enable=true
auto.create.topics.enable=false
log.retention.hours=-1
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2
num.partitions=10

# ----- Add SASL/SCRAM configuration -----
listener.security.protocol.map=CLIENT:SASL_PLAINTEXT,INTERNAL:PLAINTEXT,CONTROLLER:SASL_PLAINTEXT,EXTERNAL:SASL_PLAINTEXT
sasl.mechanism.inter.broker.protocol=PLAINTEXT

# ----- Increase Tiered Storage log level -----
log4j.logger.org.apache.kafka.server.log.remote.storage=DEBUG
log4j.logger.org.apache.kafka.server.log.remote.metadata.storage=DEBUG
log4j.logger.kafka.log.remote=DEBUG
log4j.logger.org.apache.kafka.clients.admin=DEBUG
log4j.logger.org.apache.kafka.common.network=DEBUG
log4j.logger.org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager=DEBUG
log4j.logger.io.aiven.kafka.tieredstorage=DEBUG
log4j.logger.io.aiven.kafka.tieredstorage.storage=DEBUG

# ----- Enable tiered storage -----
remote.log.storage.system.enable=true

# ----- Configure the remote log manager -----

# This is the default, but adding it for explicitness:
remote.log.metadata.manager.class.name=org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager

# Put the real listener name you'd like to use here:
remote.log.metadata.manager.listener.name=INTERNAL

# ----- Configure the remote storage manager -----

# Here you need either one or two directories depending on what you did in Step 1:
remote.log.storage.manager.class.path=/opt/bitnami/kafka/plugins/tiered-storage-core/*:/opt/bitnami/kafka/plugins/tiered-storage-gcs/*
remote.log.storage.manager.class.name=io.aiven.kafka.tieredstorage.RemoteStorageManager

# 4 MiB is the current recommended chunk size:
rsm.config.chunk.size=4194304

# ----- Configure the storage backend -----

# Using GCS as an example:
rsm.config.storage.backend.class=io.aiven.kafka.tieredstorage.storage.gcs.GcsStorage
rsm.config.storage.gcs.bucket.name=gowish-staging-kafka-tiered-storage
rsm.config.storage.gcs.credentials.default=true
# The prefix can be skipped:
#rsm.config.storage.key.prefix: "tiered-storage/"

# ----- Configure the fetch chunk cache -----

rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
rsm.config.fetch.chunk.cache.path=/bitnami/kafka/tiered-storage-cache
# Pick some cache size, 16 GiB here:
rsm.config.fetch.chunk.cache.size=17179869184
# Prefetching size, 16 MiB here:
rsm.config.fetch.chunk.cache.prefetch.max.size=16777216
# Cache retention time ms, where -1 represents infinite retention
rsm.config.fetch.chunk.cache.retention.ms=600000

Investigation

If we look into the logs we see the output from RemoteStorageManagerConfig, GcsStorageConfig, ChunkManagerFactoryConfig and DiskChunkCacheConfig also the following:

INFO 2024-07-24T12:44:21.347410509Z [resource.labels.containerName: kafka] [2024-07-24 12:44:21,347] INFO Initializing topic-based RLMM resources (org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager)
INFO 2024-07-24T12:44:21.291971771Z [resource.labels.containerName: kafka] [2024-07-24 12:44:21,289] INFO Successfully configured topic-based RLMM with config: TopicBasedRemoteLogMetadataManagerConfig{clientIdPrefix='__remote_log_metadata_client_104', metadataTopicPartitionsCount=50, consumeWaitMs=120000, metadataTopicRetentionMs=-1, metadataTopicReplicationFactor=3, initializationRetryMaxTimeoutMs=120000, initializationRetryIntervalMs=100, commonProps={security.protocol=PLAINTEXT, bootstrap.servers=kafka-staging-broker-4.kafka-staging-broker-headless.kafka-staging.svc.cluster.local:9094}, consumerProps={security.protocol=PLAINTEXT, key.deserializer=org.apache.kafka.common.serialization.ByteArrayDeserializer, value.deserializer=org.apache.kafka.common.serialization.ByteArrayDeserializer, enable.auto.commit=false, bootstrap.servers=kafka-staging-broker-4.kafka-staging-broker-headless.kafka-staging.svc.cluster.local:9094, exclude.internal.topics=false, auto.offset.reset=earliest, client.id=__remote_log_metadata_client_104_consumer}, producerProps={security.protocol=PLAINTEXT, enable.idempotence=true, value.serializer=org.apache.kafka.common.serialization.ByteArraySerializer, acks=all, bootstrap.servers=kafka-staging-broker-4.kafka-staging-broker-headless.kafka-staging.svc.cluster.local:9094, key.serializer=org.apache.kafka.common.serialization.ByteArraySerializer, client.id=__remote_log_metadata_client_104_producer}} (org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager)
INFO 2024-07-24T12:44:21.285139388Z [resource.labels.containerName: kafka] [2024-07-24 12:44:21,284] INFO Started configuring topic-based RLMM with configs: {remote.log.metadata.common.client.bootstrap.servers=kafka-staging-broker-4.kafka-staging-broker-headless.kafka-staging.svc.cluster.local:9094, broker.id=104, remote.log.metadata.common.client.security.protocol=PLAINTEXT, cluster.id=ccAwO0rpSVKjSyO40uQkJQ, log.dir=/bitnami/kafka/data} (org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager)
INFO 2024-07-24T12:44:21.280203252Z [resource.labels.containerName: kafka] (io.aiven.kafka.tieredstorage.config.CacheConfig)

This gives us the impression that the plugin was loaded successfully. We have also checked the each broker using Kafka UI and on all of them we can see remote.log.storage.system.enable=true together with all the other variables that we set.

We tried, without success, to check why we get the Tiered Storage functionality is disabled in the broker... error but we can't find much online, the only thing related to this message was this link that refer to the source code.

Looking at the source code we couldn't identify other things that could throw this error than the actual remote.log.storage.system.enable (which is set).

Things we tried

  • make sure the plugin was initialized;
  • make sure the plugin has the right authentication settings: for the sake of testing, we have used PLAINTEXT and it seems that the AdminClient connects successfully;
  • try seeing if having the config specified using environment variables (with extraEnvVars) instead of having them in the server.properties (with extraConfig) changes anything; it didn't
  • make sure the K8S service account associated with the pod has access to the Google Cloud Storage bucket; it does, we have created a test pod with the same SA and we can connect to the bucket.

Environment

  • GKE
  • Kafka: 3.7.1
  • Bitnami Kafka helm chart: 29.3.6
  • Aiven Plugin: 2024-04-02-1712056402

Do you see any issue in our setup, or do you have any recommendations on how we can approach the debugging of this situation?

Thank you for your time. 🙏

Hi @dg-gowish

Do you think it is enough having only remote.log.storage.system.enable=true on the Controllers, or should I copy over all the config related to the Tiered Storage from the broker?

I believe it's needed on the broker, there are a bunch of checks related to the logs that that take it into account, like e.g. this.

Ook, this was fun to debug. I found the reason of the error message and the issue can be closed, I don't think it's in this plugin's scope.

Investigation steps

To find it I had to go this way:

  1. Checkout the Kafka source code
  2. Compile the .jar files for each component on every log/change
  3. Rebuild a custom Bitnami Docker Image with the changes
  4. Redeploy the new image using the Bitnami Kafka Helm chart
  5. Run the scripts for topic conversion and observer the logs.

Solution

It seems that when using the Bitnami Kafka Helm chart together with Kraft (instead of Zookeeper) the remote.log.storage.system.enable needs to be set to true also on the controllers, it is not enough to set it on the brokers only. So the values should contain:

...
broker:
  ...
  extraConfig: |
    ...
    # Enable the Remote Log storage
    remote.log.storage.system.enable=true
    # AND add all the other configurations specified by the plugin on the README.md

controller:
  ...
  extraConfig: |
    # Enable the Remote Log storage (this was missing in my case)
    remote.log.storage.system.enable=true

Conclusion

After activating it I was able to activate the remote storage for topics and soon enough started to see the data in GCS.

I still have one question 🙋‍♂️

Do you think it is enough having only remote.log.storage.system.enable=true on the Controllers, or should I copy over all the config related to the Tiered Storage from the broker?

I will try to figure it out but any input is more than welcomed. 😊

I believe it's needed on the broker, there are a bunch of checks related to the logs that that take it into account, like e.g. this.

Hey @ivanyu,

Sorry I wasn’t very clear, what I meant was: I will keep all tiered storage configs as they are on brokers. But I was wondering if all of them are needed to be applied on the controllers too or is it enough to have only remote.log.storage.system.enable=true on the controllers, without all they others?

I'm pretty sure the majority of them (like remote.log.storage.manager.impl.prefix) don't concern the controller and aren't used there, but I unfortunately don't have the full list. But I think it's a safe assumption to say that nothing apart from remote.log.storage.system.enable is needed for the controller.

Thank you for your time @ivanyu.