Taxi Trip Data Processing

Technologies used and trade-offs made

NodeJS - Development simplicity of asyncronous applications
MongoDB - Trade off was made not in favour of Redis due to scaling simplicity MongoDB replica set offers
Docker - industry standard container engine
Docker Compose - facilitates local development
Helm - streamlines the process of creating multiple Kubernetes resources with charts templates. It also offers pre-packaged charts for popular open-source projects, like mongodb replicaset that was used in the setup
Kubespray - provides more advanced automation options than kubeadm, offers suitable defaults, hence reduces amount of effort on developers side
Kubernetes - cluster is not HA, single node cluster is setup due to requirement constraints: single VM with 4 Core, 8Gb RAM

Assignment details and problem statement can be found here.

Ingester - background worker, pubsub listener that ingests incoming taxi data into db
Counter - background recurring job, calculates a total number of trips for the last one hour in a separate collection. Only drop-off events are considered as a trip
App API - Web service that provides taxi trips metrics as per user request

Data ingested into trips collection is being stored for 2h, housekeeping is done by mongod process.

Once script work is finished you can start quering metrics api with the following command:

curl $(kubectl get svc | grep taxi-app | awk '{print $3}')/metrics

{
  "updated_at": "2019-09-02T09:12:17.712Z",
  "dropoff": {
    "count": 152
  }
}

You may need to wait for a minute to let data get populated.

Refer to screencast:

Setup was tested on the VM created from ubuntu-1604-xenial-v20190816 image, refer to docs/create_vm.sh script.

Across the codebase a few TODOs have been left, those are enhancements that I'd ideally like to do for a production ready system.

Apart from the above, here are some other infrastructure/configuration/architecture improvements I feel neccessary:

Web Service that provides basic metrics over live New York taxi data.

Language:JavaScript 59.6%Language:Smarty 22.7%Language:Shell 12.9%Language:Dockerfile 4.7%