- Rancher Desktop:
1.4.1
- Kubernetes:
v1.22.6
- kubectl
v1.23.3
- Helm:
v3.7.2
tl;dr: ./scripts/up.sh
kubectl create namespace kafka --dry-run=client -o yaml | kubectl apply -f -
kubectl create namespace opensearch --dry-run=client -o yaml | kubectl apply -f -
install Strimzi
kubectl create -f 'https://strimzi.io/install/latest?namespace=kafka' -n kafka
kubectl apply -f kafka/values.yaml -n kafka
wait the cluster to be ready
kubectl wait kafka/my-kafka-cluster --for=condition=Ready --timeout=300s -n kafka
install Kafka-UI
helm repo add kafka-ui https://provectus.github.io/kafka-ui-charts
helm upgrade --install my-kafka-ui kafka-ui/kafka-ui --namespace kafka -f kafka-ui/values.yaml
kubectl port-forward svc/my-kafka-ui -n kafka 8080:80
visit the Kafka-UI
follow the OpenSearch guide to deploy the opensearch service
helm repo add opensearch https://opensearch-project.github.io/helm-charts/
helm repo update
helm upgrade --install my-opensearch opensearch/opensearch --namespace opensearch -f opensearch/values.yaml
helm upgrade --install my-opensearch-dashboards opensearch/opensearch-dashboards --namespace opensearch -f opensearch-dashboards/values.yaml
port-forward the opensearch dashboard service
kubectl port-forward svc/my-opensearch-dashboards -n opensearch 5601
and visit the opensearch dashboard with the following credentials:
username: admin
password: admin
verify the opensearch service by testing these operations on the opensearch dashboard
attention: the replication-factor <= number of kafka brokers
kubectl -n kafka run kafka-topic-operator -ti --image=quay.io/strimzi/kafka:0.30.0-kafka-3.2.0 --rm=true --restart=Never -- bin/kafka-topics.sh --bootstrap-server my-kafka-cluster-kafka-bootstrap:9092 --create --topic my-first-topic --partitions 1 --replication-factor 1
kubectl -n kafka run kafka-topic-operator -ti --image=quay.io/strimzi/kafka:0.30.0-kafka-3.2.0 --rm=true --restart=Never -- bin/kafka-topics.sh --bootstrap-server my-kafka-cluster-kafka-bootstrap:9092 --list
kubectl -n kafka run kafka-topic-operator -ti --image=quay.io/strimzi/kafka:0.30.0-kafka-3.2.0 --rm=true --restart=Never -- bin/kafka-topics.sh --bootstrap-server my-kafka-cluster-kafka-bootstrap:9092 --describe --topic my-first-topic
kubectl -n kafka run kafka-topic-operator -ti --image=quay.io/strimzi/kafka:0.30.0-kafka-3.2.0 --rm=true --restart=Never -- bin/kafka-topics.sh --bootstrap-server my-kafka-cluster-kafka-bootstrap:9092 --delete --topic my-first-topic
kubectl -n kafka run kafka-producer -ti --image=quay.io/strimzi/kafka:0.30.0-kafka-3.2.0 --rm=true --restart=Never -- bin/kafka-console-producer.sh --bootstrap-server my-kafka-cluster-kafka-bootstrap:9092 --topic my-first-topic --property parse.key=true --property key.separator=:
kubectl -n kafka run kafka-consumer -ti --image=quay.io/strimzi/kafka:0.30.0-kafka-3.2.0 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server my-kafka-cluster-kafka-bootstrap:9092 --topic my-first-topic --from-beginning --formatter kafka.tools.DefaultMessageFormatter --property print.timestamp=true --property print.key=true --property print.value=true
create the topic with multiple partitions
kubectl -n kafka run kafka-topic-operator -ti --image=quay.io/strimzi/kafka:0.30.0-kafka-3.2.0 --rm=true --restart=Never -- bin/kafka-topics.sh --bootstrap-server my-kafka-cluster-kafka-bootstrap:9092 --create --topic my-first-consumer-group-topic --partitions 3 --replication-factor 1
create the consumer group
kubectl -n kafka run kafka-consumer-group-0 -ti --image=quay.io/strimzi/kafka:0.30.0-kafka-3.2.0 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server my-kafka-cluster-kafka-bootstrap:9092 --topic my-first-consumer-group-topic --group my-first-consumer-group --from-beginning
kubectl -n kafka run kafka-consumer-group-1 -ti --image=quay.io/strimzi/kafka:0.30.0-kafka-3.2.0 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server my-kafka-cluster-kafka-bootstrap:9092 --topic my-first-consumer-group-topic --group my-first-consumer-group --from-beginning
send some messages
kubectl -n kafka run kafka-producer -ti --image=quay.io/strimzi/kafka:0.30.0-kafka-3.2.0 --rm=true --restart=Never -- bin/kafka-console-producer.sh --bootstrap-server my-kafka-cluster-kafka-bootstrap:9092 --topic my-first-consumer-group-topic
attention: same group will share the message, but different group will receive same message when attaching to the same topic
list all consumer groups
kubectl -n kafka run kafka-consumer-group-operator -ti --image=quay.io/strimzi/kafka:0.30.0-kafka-3.2.0 --rm=true --restart=Never -- bin/kafka-consumer-groups.sh --bootstrap-server my-kafka-cluster-kafka-bootstrap:9092 --list
delete a consumer group
kubectl -n kafka run kafka-consumer-group-operator -ti --image=quay.io/strimzi/kafka:0.30.0-kafka-3.2.0 --rm=true --restart=Never -- bin/kafka-consumer-groups.sh --bootstrap-server my-kafka-cluster-kafka-bootstrap:9092 --delete --group my-first-consumer-group
reset offset of a consumer group to replay the topic
kubectl -n kafka run kafka-consumer-group-operator -ti --image=quay.io/strimzi/kafka:0.30.0-kafka-3.2.0 --rm=true --restart=Never -- bin/kafka-consumer-groups.sh --bootstrap-server my-kafka-cluster-kafka-bootstrap:9092 --reset-offsets --to-earliest --group consumer-opensearch-demo --all-topics --execute
attention: follow local_dev doc to setup the prerequisites
kubectl apply -f kafka/topics.yaml -n kafka
StickyPartitioner
to improve the performance of batch producing at ProducerDemoWithCallback.java- messages with the same key will be sent to the same partition at ProducerDemoKey.java
- consumer groups and partition rebalance
- moving partitions between consumers is called rebalance
- eager rebalance: all consumers stop and rejoin
- cooperative rebalance (incremental rebalance): reassign a small subset of the partitions
- auto offset commit
.commitAsync()
called periodically between.poll()
calls
- kafka topic availability
acks=all(-1)
andmin.insync.replicas=2
is the most popular option for data durability and availability and allows you to withstand at most the loss of one kafka broker
- idempotent producer
- won't introduce duplicates on network error
- kafka
v3.0+
producer safe by default- acks=-1
- enable.idempotence=true
- max.in.flight.requests.per.connection=5
- retries=2147483647
- compression
- message compression at the producer level
- Cloudflare benchmarks
- pros
- smaller request size
- low latency
- better throughput
- better disk utilization in Kafka
- cons(minor)
- producers must commit some CPU cycles to compression
- consumers must commit some CPU cycles to decompression
- always use compression at the producer level
- message compression at the broker/topic level
compression.type=producer
- message compression at the producer level
- message batching
linger.ms
is the time in milliseconds to wait before sending a batch of messagesbatch.size
- delivery semantics
- at most once: offsets are committed as soon as the message is received. If the processing goes wrong, the message will be lost(it won't be read again)
- at least once (preferred): offsets are committed after the message is processed. If the processing goes wrong, the message will be read again. This can result in duplicate processing of messages. MAke sure your processing is idempotent.
- exactly once: can be achieved for Kafka => Kafka workflows using the Transactional API (easy with Kafka Streams API). For Kafka => Sink workflows, use an idempotent consumer.
Kafka Connect makes it easy to stream from numerous sources into Kafka and from Kafka into numerous sources, with hundreds of available connectors.
- configurable data pipelines
- interactive between external systems with kafka
- supported by the strimzi operator
Data processing and transformation library within Kafka.
- Java API
- exactly-once capabilities
- one record at a time (no batching)
- partition
- small cluster (< 6 brokers): 3 x brokers
- big cluster (> 12 brokers): 2 x brokers
- more partition means more elections to perform for Zookeeper
- replication
- at least 2, usually 3, maximum 4
- higher replications means
- better durability
- higher availability
- but more latency
- but more disk spaces
- cluster
- with Zookeeper
- max 200,000 partitions - Zookeeper scaling limit
- 4,000 partitions per broker
- max 200,000 partitions - Zookeeper scaling limit
- with Kraft
- potential for millions of partitions
- with Zookeeper
- topics are made of partitions, and partitions are made of segments.
- log cleanup policies
- delete
- by time
- by size
- compact
- log compaction
- keep the most recent values of each key
- log compaction
- delete
<message type>.<dataset name>.<data name>.<data format>
tl;dr: ./scripts/down.sh
kubectl delete -f kafka/ -n kafka
kubectl delete -f 'https://strimzi.io/install/latest?namespace=kafka' -n kafka
helm uninstall my-kafka-ui -n kafka
helm uninstall my-opensearch -n opensearch
helm uninstall my-opensearch-dashboards -n opensearch
kubectl delete namespace kafka
kubectl delete namespace opensearch