In this project, I have built a machine learning system to analyze disaster data from Figure Eight to build a model for an API that classifies disaster messages.
File structure of the project:
- app
| - template
| |- master.html # main page of web app
| |- go.html # classification result page of web app
|- run.py # Flask file that runs app
- data
|- disaster_categories.csv # data to process
|- disaster_messages.csv # data to process
|- process_data.py # ETL pipline
|- DisasterResponse.db # database to save clean data to
- models
|- train_classifier.py # ML pipline
|- classifier.pkl # saved model
- notebooks
|- ETL Pipeline Preparation.ipynb # test ETL pipline
|- ML Pipeline Preparation.ipynb # test ML pipline
- README.md
There are three main Python scripts this project:
-
process_data.py
: contain ETL pipeline- Loads the
messages
andcategories
datasets - Merges the two datasets
- Cleans the data
- Stores it in a SQLite database
- Loads the
-
train_classifier.py
: contain ML pipeline- Loads data from the SQLite database
- Splits the dataset into training and test sets
- Builds a text processing and machine learning pipeline
- Trains and tunes a model using GridSearchCV
- Outputs results on the test set
- Exports the final model as a pickle file
-
run.py
: Flask file that runs app- data visualizations using Plotly in the web app
-
Run the following commands in the project's root directory to set up database and model.
-
To run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
-
To run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
-
-
Run the following command in the app's directory to run the web app.
python run.py
- Go to http://0.0.0.0:3001/