csarrazi / kafka-white-label-app

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

White label applications

This repository contains examples of application to produce and consume to Apache Kafka. It includes many best-practices while interacting with Apache Kafka. Many comments have been written to explain the logic and the importance of a few parameters.

Those examples might not be perfect, but they could be used as a base source code for your application.

Overall tips & tricks

Use cases & Architecture

  • You probably do not need a huge number of partition
  • Rely on Kafka Connect connectors if you want to get or push data from / to an external system
  • Monitor your application, specifically the Consumer lag
  • Rely on the Schema Registry for all topics used to share data across multiple applications

Producer

  • Always think about Data Durability
    • acks=all, enable.idempotence=true are set by default since Kafka 3.0
    • Before Kafka 3.0, you must set those parameters explicitly if you care about data durability
  • Retries are automatically performed by the producer since Kafka 2.1
    • The default configuration is to retry for 2 minutes (delivery.timeout.ms)
  • Avoid sending messages synchronously, this could impact performances
  • Always inject the Kafka client configuration dynamically
    • e.g. with a configuration file, a ConfigMap or environment variables
    • You might need to update unexpected parameters in production, e.g. timeout parameters

Consumer

  • You need some error handling strategy, default behavior is log and stop everything; there are two kind of exception that you must manage:
    • Deserialization exception: thrown if a message can not be deserialized properly
    • Unexpected process exception: if you can not process a message, e.g. by throwing a NullPointerException due to a bug or invalid data
    • If those exceptions are not handle, all consumer will try processing this message and crash in an infinite loop.
    • This kind of messages are called a Poison pill
  • Consumer must be idempotent
    • The default processing guarantee of Apache Kafka, and most message broker, is "at-least-once", thus message could be processed twice
    • Reaching "exactly-once" or "effectively-once" processing is feasible, but, in most case, having an idempotent consumer is easier to implement
  • Ensure a fast processing of each message, long-running operations, such as the invocation of a distant Web Services, might generate infinite timeout and consumer rebalance
  • Be aware of the order of the messages and do not assume that messages will be ordered by time
  • Rely on auto commit as much as possible, commit manually only if required, e.g. due to asynchronous processing or exactly-once like implementation

Links and references

  • Confluent Developer Website: contains plenty of tutorials, courses and examples to learn leveraging Apache Kafka. A must-know.
  • Apache Kafka documentation: page listing all parameters that could be tuned
  • Optimizing Kafka: white-paper explaining all optimal parameters that could impact durability, throughput, availability or latency
  • Confluent documentation: Schema Registry, ksqlDB, Ansible (cp-ansible) , many connectors, Kubernetes (Confluent For Kubernetes), Confluent Platform and Confluent Cloud documentation
  • Confluent support: Confluent support can be used for development related questions

About


Languages

Language:Java 100.0%