There aren't any necessary libraries to run the code here beyond the Anaconda distribution of Python. The code should run with no issues using Python versions 3.*.
I used the following libraries and versions:
- chardet -- chardet==3.0.4
- Flask -- Flask==1.1.2
- joblib -- joblib==0.14.1
- json -- json=2.0.9
- nltk -- nltk==3.5
- numpy -- numpy==1.18.3
- pandas -- pandas==1.0.3
- plotly -- plotly==4.6.0
- requests -- requests==2.23.0
- scikit-learn -- scikit-learn==0.22.2.post1
- SQLAlchemy -- SQLAlchemy==1.3.16
The aim of this project was to build a web application that would categorize disaster-related messages.
This was done in two steps, with two pipelines:
- Extract-Transform-Load pipeline, where I worked with two CSV files containing messages and categories, and end up with a database containing categorized messages.
- Machine Learning pipeline, where I built, tested and compared two different models to end up with one that predicts with a 95% accuracy new messages categories.
- README.md, this file you're reading
- web_app/ - contains the boilerplate code necessary to visualize the web application
- web_app/README.md - contains the necessary steps to run the web application in a local environment
- DisasterResponse.db (~6.5 MB) - it gets stored in your local environment after running web_app/data/process_data.py
- classifier.pkl (~997 MB) - it gets stored in your local environment after running web_app/models/train_classifier.py.
The dataset contains 30,000 messages drawn from events including an earthquake in Haiti in 2010, an earthquake in Chile in 2010, floods in Pakistan in 2010, super-storm Sandy in the U.S.A. in 2012, and news articles spanning a large number of years and 100s of different disasters. The data has been encoded with 36 different categories related to disaster response and has been stripped of messages with sensitive information in their entirety.
– Multilingual Disaster Response Messages, Appen Open Source Dataset