javsanbel2 / streaming-news-worldwide

NewsAPI and some bigdata tech

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Streaming news world wide

We will get a big amount of streaming data from the NewsAPI. We will store this requested data into a Kafka cluster through a Java spring boot application. After we will process this data with Spark and store in HBase and Hive.

Pipeline

Structure

  • collector: Get information from APIs and introduce this data into Kafka Cluster
  • consumer: Receive information and process via Spark streaming and save it into Hive & HBase
  • start.sh : Script to start project
  • test.sh: Script to run tests in both projects
  • scripts.sh: Scripts to manage kafka and stop servers
  • config.txt: Configuration to apply in the project (query to run, time intervals...)

Run project

Get into the folder and:

./test.sh ./run.sh

Documentation v1 (Updated) : Google slides Documentation v2 : Google doc

About

NewsAPI and some bigdata tech


Languages

Language:Java 48.8%Language:Scala 42.9%Language:Shell 8.3%