TylerIlunga/book_reviews_pipeline

A simple pipeline using:

CSV data is loaded into S3
CSV data is read from S3
- Initiated via a GET request to an available endpoint (/loaddata)
- Example: Scheduled batch job issues a request to the endpoint to load more data into the PSQL database.
Additional book data is pulled from the Google Books API via a given ISBN
Additional book data is merged with the CSV data from S3
Merged data is sent to three different defined topics in a running Kafka cluster
Data in Kafka is consumed via a consumer
Consumed Kafka data is transformed into the appropriate table schemas defined in our PostgreSQL database
Data is inserted into the appropriate table in our PostgreSQL database

About

ETL Pipeline using Book Review Data from Kaggle

Language:Python 95.2%Language:Dockerfile 4.8%