miladbehrooz / Dockerized_Data_Pipeline

A data pipeline with Docker to perform Sentiment Analysis on tweets and post it on a slack channel via a bot

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A Dockerized Data Pipeline for Sentiment Analysis on tweets

There are 5 steps in the data pipeline:

  • Extract tweets with Tweepy API
  • Loaded the tweets in a MongoDB
  • Extracted the tweets from MongoDB, performed sentiment analysis on the tweets, and loaded the transformed data in a PostgresDB (ETL job)
  • Loaded the tweets and corresponding sentiment assessment in a PostgresDB
  • Extracted the data from the PostgresDB and posted it in a slack channel with a Slack bot

workflow

Usage

  • Install Docker on your machine
  • Clone the repository: git clone https://github.com/miladbehrooz/Dockerized_Data_Pipeline.git
  • Get credentials for Twitter API and insert them in tweet_collector/credentials.py
  • Get credentials for Slack bot and insert them in slack_bot/credentials.py
  • Run docker-compose build, then docker-compose up in terminal

About

A data pipeline with Docker to perform Sentiment Analysis on tweets and post it on a slack channel via a bot

License:MIT License


Languages

Language:Python 92.4%Language:Dockerfile 7.6%