gopigof / CSYE7200_FinalProject

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Local Crime News Aggregator

The local crime news aggregator is a dashboard system consisting of a Scala backend system and near real-time configurable Kibana Dashboard. Crime incident data is fetched from two different types of sources.

The data is retrieved, processed, and normalized through a Spark + Scala backend system. It is then pushed to Elastic Search nodes deployed on Kubernetes pods. The Kibana dashboards deployed as Kubernetes deployment fetches data from Elasticsearch Index to render the heatmap

System Architecture

Tech Stack

  • Data Processing Backend: Scala, Apache Spark
  • Dashboards: Kibana (ELK Stack), Beats (system heartbeats)
  • Databases: Elasticsearch (ELK Stack)
  • Infrastructure:
    • Backend is containerized as a Docker image
    • Elasticsearch is deployed as K8s stateful set with 3 replicas. This ensures 3 instances of elasticsearch with data sharded between them
    • Kibana is deployed as K8s deployment with PodDisruptionBudget. This ensures 99% uptime and responsive endpoints
    • AWS services have been used extensively in the K8s configurations from EC2, ELB, EBS, S3, etc

Directory Structure

.
├── Dockerfile - Builds a docker image of the Scala code compiled and built
├── README.md
├── build.sbt 
├── dashboard - Kibana dashboard configuration
├── docker-elk-stack - ELK stack deployment configurations using Docker Compose
├── k8s-stack - ELK stack deployment configurations using Kubernetes and AWS 
├── project
├── src
├── static_media - Repo related static media files
├── target
└── utilities - extra utilities created over the course of the project 

Acceptance Criteria

  • Dashboard queries and filters response time must be less than 5 sec.
    • We were able to achieve 200ms to 250ms response time for most queries. Very complex queries and cache misses made up the 99% percentile queries at around 1s response time.
  • Dashboard's service availability must be greater than 99%.
    • The kibana dashboards are deployed as a K8s (Kubernetes) deployment with as a PodDisruptionBudget. This property on Kubernetes pods ensures that all the specified pods will have an 100% uptime, given the system resources are available.

Steps to Execute

Data Processing Backend

  • The sbt-native-packager plugin enables generating docker images post building the Scala project through SBT
  • sbt docker:publishLocal command generates a docker image locally. (Pre-req requires Docker installed)

Local ELK Stack Deployment

  • The database and dashboards can be deployed locally for testing purposes. This strategy doesn't ensure service uptime of any level.
  • The docker-elk-stack contains all the configurations required to deploy the ELK stack v8.7.0 through Docker Compose.
  • Change dir into the folder cd docker-elk-stack
  • Use the command docker compose up -d to start elasticsearch, kibana and complete the initial setup required.
  • Use docker compose ls and docker ps to verify the running containers
  • Elasticsearch and Kibana support communication over REST HTTP API calls. Elasticsearch runs on port 9200 and Kibana on 5601

Deploying ELK Stack on AWS

  • While deploying ELK Stack on AWS through Kubernetes, the following tools are required: kops, kubectl, awscli
  • Create an IAM user with required IAM policies configured. Configure the local terminal environment with the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY values
  • Create the Kubernetes cluster using (update values as required)
kops create cluster k8s-cluster.example.com \
  --node-count 3 \
  --zones $ZONES \
  --node-size $NODE_SIZE \
  --control-plane-size $CONTROL_PLANE_SIZE \
  --control-plane-zones $ZONES \
  --networking cilium \
  --topology private \
  --bastion="true" \
  --yes
  • Post cluster creation, validate the cluster infra using kops validate cluster
  • Apply node configurations from each of the folders, in the order of the naming convention using kubectl apply -f <dir_name>
  • Delete the K8s cluster using kops delete cluster k8s-cluster.example.com

Notes

  • Elasticsearch, Kibana and the other extensions are generally referred to commonly as ELK Stack
  • Kubernetes is usually abbreviated to K8s.

About


Languages

Language:Scala 66.2%Language:Shell 26.1%Language:Dockerfile 6.7%Language:Go 1.0%