This repository contains examples of application to produce and consume to Apache Kafka. It includes many best-practices while interacting with Apache Kafka. Many comments have been written to explain the logic and the importance of a few parameters.
Those examples might not be perfect, but they could be used as a base source code for your application.
- You probably do not need a huge number of partition
- Rely on Kafka Connect connectors if you want to get or push data from / to an external system
- Monitor your application, specifically the Consumer lag
- Rely on the Schema Registry for all topics used to share data across multiple applications
- Always think about Data Durability
acks=all
,enable.idempotence=true
are set by default since Kafka 3.0- Before Kafka 3.0, you must set those parameters explicitly if you care about data durability
- Retries are automatically performed by the producer since Kafka 2.1
- The default configuration is to retry for 2 minutes (
delivery.timeout.ms
)
- The default configuration is to retry for 2 minutes (
- Avoid sending messages synchronously, this could impact performances
- Always inject the Kafka client configuration dynamically
- e.g. with a configuration file, a ConfigMap or environment variables
- You might need to update unexpected parameters in production, e.g. timeout parameters
- You need some error handling strategy, default behavior is log and stop everything; there are two kind of
exception
that you must manage:
- Deserialization exception: thrown if a message can not be deserialized properly
- Unexpected process exception: if you can not process a message, e.g. by throwing a NullPointerException due to a bug or invalid data
- If those exceptions are not handle, all consumer will try processing this message and crash in an infinite loop.
- This kind of messages are called a Poison pill
- Consumer must be idempotent
- The default processing guarantee of Apache Kafka, and most message broker, is "at-least-once", thus message could be processed twice
- Reaching "exactly-once" or "effectively-once" processing is feasible, but, in most case, having an idempotent consumer is easier to implement
- Ensure a fast processing of each message, long-running operations, such as the invocation of a distant Web
Services, might generate infinite timeout and consumer rebalance
- If you need to perform a potentially long-running operation, you should do it asynchronously
- For asynchronous processing, be aware of https://github.com/confluentinc/parallel-consumer/
- Be aware of the order of the messages and do not assume that messages will be ordered by time
- Rely on auto commit as much as possible, commit manually only if required, e.g. due to asynchronous processing or exactly-once like implementation
- Confluent Developer Website: contains plenty of tutorials, courses and examples to learn leveraging Apache Kafka. A must-know.
- Apache Kafka documentation: page listing all parameters that could be tuned
- Optimizing Kafka: white-paper explaining all optimal parameters that could impact durability, throughput, availability or latency
- Confluent documentation: Schema Registry, ksqlDB, Ansible (cp-ansible) , many connectors, Kubernetes (Confluent For Kubernetes), Confluent Platform and Confluent Cloud documentation
- Confluent support: Confluent support can be used for development related questions