The end objective is a machine learning model that can predict the poverty level of a household. However, before we get carried away with modeling, it's important to understand the problem and data. Also, we want to evaluate numerous models before choosing one as the "best" and after building a model, we want to investigate the predictions. Our roadmap is therefore as follows:
Understand the problem (we're almost there already) Exploratory Data Analysis Feature engineering to create a dataset for machine learning Compare several baseline machine learning models Try more complex machine learning models Optimize the selected model Investigate model predictions in context of problem Draw conclusions and lay out next steps The steps laid out above are iterative meaning that while we will go through them one at a time, we might go back to an earlier step and revisit some of our decisions. In general, data science is a non-linear pracice where we are constantly evaluating our past decisions and making improvements. In particular, feature engineering, modeling, and optimization are steps that we often repeat because we never know if we got them right the first time!
- Python
- Anaconda
- Jupyter Notebook
- Windows 10 (21H2)