espoirMur / balobi_nini

An End to End Data Science Project, where I used Tweepy and Airflow to collect tweets related to the DRC and topic modeling technics to discover which topics Congolese are talking about on Twitter.

Home Page:https://twitter.com/olobi_nini

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Contributors Forks Stargazers Issues MIT License LinkedIn

Table of Contents

About The Project

What did they say? or What are they talking about in Lingala

Finally this project is taking shape , I finally found what I am building....

Among multiple concepts I am testing on this project.

Here are some feature that are working in this repository:

  • This project use Tweepy alongside with Apache Airflow to collect all tweets containing the words RDC and DRC on a hourly basis.

  • Once those tweets are collected I apply some cleaning on them and generate a WordCloud to display the most used word by Congolese on Social Media on a given day.

  • Again I use Airflow and Celery as scheduler to tweet that WordCloud generated on a daily basis...

  • I used StreamIt to display that WordCloud as an image on a web page.

The other feature I will be implementing next time are :

  • Using Topic Modeling to identify the different topic used in those tweets.

  • Using Sentiment analysis and display the sentiment in the congolese tweets.

  • Once I got those results I will be updating the StreamLit dashboard...

Built With

Getting started

Clone the project to have a local copy in your machine.

We have decided to use docker to build and have the project running...

Install Redis

The project use redis as broker, install it and get it running using this link

Install Postgres

The project also use postgres as database , install it and create a database for the project. Keep it's name somewhere for future use.

Follow this URL to create a user and a database.

Generate the .env file

cp .env.sample .env

Run Docker

Make sure you have docker installed and running and docker-compose and then go inside the project directory and run :

docker-compose up -d --build

Then chill until I get motivation to finish this readme

Tweeking DB migration

Update the database using this command :

docker-compose -f docker-compose-prod.yml exec streamlit-instance python manage.py db upgrade

To create the table for tweets analysis

PS : connect to the database you are using and delete the alembic version to avoid conflict :

delete * from alembic_version

Once you have created the tables you can now run the following command to update the database for airflow

  • docker-compose -f docker-compose-prod.yml exec -T streamlit-instance airflow initdb

Roadmap

Anytime I learn something new I would like to apply it on this project.

I made a Todo list about item I will work on on this project.

It depend on my motivation and my mood and how I feel when working on this project ...

But the Todo can be found here...

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under my personal open source licence. See LICENSE for more information.

Contact

Espoir Murhabazi - Twitter - espoir.mur on gmail

Project Link: https://github.com/espoirMur/balobi_nini

About

An End to End Data Science Project, where I used Tweepy and Airflow to collect tweets related to the DRC and topic modeling technics to discover which topics Congolese are talking about on Twitter.

https://twitter.com/olobi_nini

License:Other


Languages

Language:Jupyter Notebook 98.9%Language:Python 1.1%Language:Dockerfile 0.0%Language:JavaScript 0.0%Language:Mako 0.0%Language:HTML 0.0%Language:Makefile 0.0%