yoreei / theoremus

theoremus task

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Theoremus Backend Task

Demo

The GCP server on 34.135.233.145 hosts a smaller-scale demo of the app. Try it out:

http://theoremus-challenge.live/vehicles/2021-09-24T01:40:02Z/2021-09-24T01:40:02Z/day

You have a GUI access to MongoDB:

http://theoremus-challenge.live:27080/db/theoremus/vehicles

Try deleting some of the documents there and see how that affects the Web API results! (Don't worry, all data is restored on container restart)

Quickstart

How to run the task with docker-compose:

  1. Insert data for the producer at ./kafka-producer/data/raw_gps_data.csv
  2. Modify ./kafka-producer/conf.json to match your Kafka configuration (The default values should work if running Kafka from docker-compose)
  3. docker-compose up. After the services are up and running, you should see lots of messages coming from kafka-producer and kafka-consumer. Unless they are errors, this is intentional. At this point, you could head to localhost:27080 (Mongo Express) or localhost:9080 (Kafka-UI) to monitor the data flow from the pipeline. Finally, you can test the web API like this: localhost:8080/vehicles/from /to /(day|hour). For example:

http://localhost:8080/vehicles/2020-09-24T01:40:02Z/2022-09-24T01:40:02Z/day

Architecture overview

Architecture detailed

Kafka Producer is the first member of the data pipeline. It reads messages from a .CSV file, filters out the messages without valid GPS data and sends the rest to Kafka.

Kafka Is the intermediary between Kafka Producer and Kafka Consumer, allowing them to pass messages to each other asynchronously.

Kafka Consumer Fetches the data from Kafka and prepares it for insertion into MongoDB. Additionally, the fields "IDDay" and "IDHour" are computed and added to the message to facilitate queries which aggregate on this information. For example, if the data.date-time.system = "2020-09-24T01:40:02Z", then IDDay = "2020-09-24T00:00:00Z" and "IDHour" = "2020-09-24T01:00:00Z"

MongoDB was chosen because 1) The incoming data naturally fits into a document format. 2) It allows for better flexibility if the data structure format changes.

Web API Is a Django app that listens for GET requests in the following format: "/vehicles/from /to /(day|hour)". The parameters supplied in the URL are used to generate a query to MongoDB. The app is protected from injections because we communicate with the MongoDB driver using data structures instead of string queries.

Recreate Demo

If you would like to recreate the demo hosted on theoremus-challenge.live, you can try out the alternative compose file:

docker-compose -f aws-compose.yml up

You can see the sample data used in the demo in the file: aws/mongo-seed-aws/vehicles

About

theoremus task


Languages

Language:Python 55.1%Language:Go 42.0%Language:Dockerfile 2.9%