A Project by Kihoon Sohn, Leticia Beeck, Awab Idris & James Linek
This project repository contains the notebooks and datasets for the West Nile Virus Kaggle challenge for the team members listed above. The challenge tasks the contestants to create a model to predict the location and presence of West Nile Virus in the city of Chicago, and then utilize a subsequent cost-benefit analysis explaining the pros and cons of spraying the city in the warmer months.
- From the dataset, how can we determine when and where the West Nile virus will occur in the greater Chicago area?
- What preventative measures can be taken to combat the illness?
- What are the costs and benefits of prevention?
Using ADABoost, Logistic Regression and RandomForest modeling, we took the best scores for each model and submitted them to Kaggle via CSV. Before applying these models, we performed extensive EDA and feature engineering.
Using the three different methodologies listed above, we got the best score using Logistic Regression with a Kaggle score of .76181.
-
Project 4 final notebook.ipynb : final cleaned notebook of the work. (EDA, Modeling, Export CSV to Kaggle submission)
-
pred.csv : Kaggle submitted csv file
-
README.md : contains executive summary and instructions.
-
(GA-DSI)Instruction.md : given instruction for the team challenge from General Assembly, Data Science Immersive course. (cohort: GA-DSI-US-4)
-
DESCR.md : descriptions of variables, original source from the kaggle competition page.
-
/assets : folder contains datasets(train, test, weather, spray data)
-
/archive : archive of the files and notebooks during the team work.