cats-effect fs2 fs2-kafka kafka opensearch scala scala-cli scala3

wikimedia-sampler

A toy example of a Kafka to OpenSearch / ElasticSearch pipeline for WikiMedia data written in pure FP Scala 3 with Cats Effect, fs2-kafka and opensearch-java client.

It consists of:

producer: a Kafka producer that reads from the WikiMedia event stream and writes to a kafka topic
consumer: a Kafka consumer that reads from a Kafka topic and writes to OpenSearch

Setup

Install scala-cli

Start VM (optional)

# MacOS only, not needed if you have Docker Desktop or similar
./colima.sh

Run local infra.

docker-compose -f ./docker-compose.yml up

Run the app

# start producer process
scala-cli ./sampler -- produce
# start consumer process
scala-cli ./sampler -- consume
# run the producer & consumer processes concurrently
scala-cli ./sampler -- produce-consume
# for more options
scala-cli ./sampler -- help

You can inspect processed data in:
- Kafka UI available at http://localhost:8080
- OpenSearch-Dashboards / Kibana console available at http://localhost:5601/app/dev_tools#/console

Potential improvements

parametrize more OpenSearch client options and move them to the CLI level
use Avro or other format instead of JSON
kafka consumer graceful shutdown
use explicit mapping instead of dynamic one in OpenSearch
create Grafana / Kibana dashboards
add more tests

About

Sample data from WikiMedia stream through Kafka and index it in OpenSearch

cats-effect fs2 fs2-kafka kafka opensearch scala scala-cli scala3

Languages

Language:Scala 99.6%Language:Shell 0.4%