- Clone the repository using
git clone https://github.com/snithish/kafka-perf.git
cd kafka-perf
- Create Python Virtualenv
- Activate virtual env
- Install dependencies
pip install -r requirements.txt
- Download a large dataset we use the NYC Taxi Dataset (1GB)
wget https://nyc-tlc.s3.amazonaws.com/trip+data/fhvhv_tripdata_2022-02.csv
- Use docker compose to bring up kafka
docker-compose up -d --force-recreate
- Use the below command to create a topic
docker exec --interactive --tty broker \
kafka-console-producer --bootstrap-server broker:9092 \
--topic quickstart
- Run the producer script, you can edit script to increase rows sent to kafka