mohantom / kafka-streams-docker

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kafka Streams on Docker

Adapted from Udemy course: Apache Kafka Series - Kafka Streams for Data Processing

Kafka-streams-examples

Tech stack

  • Kafka
  • Kafka Streams
  • Elasticsearch
  • MongoDB
  • Spring Boot
  • Spring Retry
  • React UI
  • Firebase (authentication)

Modules

  1. movie-loader: load movies from csv file movies_enriched.csv to kafka topic movies
  2. movie-mongo: subscribe to movies, save all to mongo db
  3. movie-streams: subscribe to movies, count by year, publish it to movies-eyar
  4. movie-es: subscribe to movies and movies-year, save them to es7 withe the same index names
  5. movie-ui: show latest movies and top 250 movies by rating

How to launch

// build project
cd src
mvn clean install

// start docker desktop

// start application
cd infrastructure/docker
docker-compose -f common.yml up --build -d
docker-compose -f common.yml -f movies.yml up --build -d
// docker-compose -f common.yml -f words.yml up --build -d

// load movies to mongo/es7
http://localhost:8040/mongo/movie/load

// go to UI
http://localhost:3000/home

// to check logs
docker logs wordcount
docker logs wordcountinput
docker logs wordcountoutput -f

// to shutdown and cleanup
docker-compose  -f common.yml -f word-count.yml down
docker system prune --volumes

movie-loader

to rescan (enrich) movies

  1. http://localhost:8010/loader/movie/scan?append=false

  2. drop mongo movie collection, and es7 movies index or docer system prune --volumes

  3. reload movie data to mongo and es7: http://localhost:8010/loader/movie/load

check movie duplicates

import pandas as pd

movies = pd.read_csv("data/movies_enriched.csv")

dups = movies[movies.duplicated(['title', 'year'], keep=False)]
dups[['title', 'year', 'imdbid']].to_csv("data/movies_dups.csv", index=False)

movies_unique = movies.drop_duplicates(subset=['title', 'year'], keep='first')
movies_unique.to_csv("data/movies_unique.csv")

Mongo

try this in browser: http://localhost:8040/mongo/movie/query?title=Terminator

find top 250 rated movies: http://localhost:8040/mongo/movie/all?size=250&sortField=rating&direction=DESC&page=0

Or query with Mongo Compass connection: mongodb://root:example@localhost:27017/?authSource=admin&readPreference=primary&appname=MongoDB%20Compass%20Community&ssl=false

ES7

ES Rest Highlevel API

use postman

localhost:9200/movies/_search
localhost:9200/movies-year/_search

TODO

  • Endpoint to drop es7 index and mongo collection
  • React app to display movies
    • maven build ui, copy build to docker
    • query stats
    • query stats from es7
    • infinite scroll
    • filters: genre, years, rating, director?
    • fix movie stats filter
  • deploy to aws
  • avoid scanning all files

About


Languages

Language:Java 67.5%Language:JavaScript 29.8%Language:HTML 1.2%Language:Dockerfile 1.1%Language:CSS 0.4%