This project has two components:
- Kafka AvroProducer -> produces cryptocurrency data.
- Hudi DeltaStreamer -> ingests the data from Kafka and writes to hudi tables.
- Install packages:
pip install -r requirements.txt
- Start Producer:
python producer/producer.py --topic <topic-name> --bootstrap-servers <broker-server> --schema-registry <schema-registry-url> --log-file <log-file-path>
Refer Documentation for configuration.
- Install Spark
- Update Hudi config and kafka topic settings in
kafka-source.properties
- Download Hudi utilities bundle and set path in
hudi-delta-streamer.sh
- Start:
delta-streamer/hudi-delta-streamer.sh <spark-master> <broker-server> <schema-registry-url> delta-streamer/kafka-source.properties <output-path>
- Kafka
- Schema Registry
- Zookeeper
- Producer
- Consumer (Hudi DeltaStreamer)
- Clone repository
- Run:
cd docker
- Start services:
docker-compose up