haoapple / predict_west_nile_virus

A team repo to apply Kaggle Challenge. (GA-DSI)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WEST NILE IN CHICAGO: A PREDICTIVE ANALYSIS

A Project by Kihoon Sohn, Leticia Beeck, Awab Idris & James Linek

This project repository contains the notebooks and datasets for the West Nile Virus Kaggle challenge for the team members listed above. The challenge tasks the contestants to create a model to predict the location and presence of West Nile Virus in the city of Chicago, and then utilize a subsequent cost-benefit analysis explaining the pros and cons of spraying the city in the warmer months.

GUIDING QUESTIONS

  • From the dataset, how can we determine when and where the West Nile virus will occur in the greater Chicago area?
  • What preventative measures can be taken to combat the illness?
  • What are the costs and benefits of prevention?

METHODOLOGY

Using ADABoost, Logistic Regression and RandomForest modeling, we took the best scores for each model and submitted them to Kaggle via CSV. Before applying these models, we performed extensive EDA and feature engineering.

RESULTS

Using the three different methodologies listed above, we got the best score using Logistic Regression with a Kaggle score of .76181.

DIRECTORY

  • Project 4 final notebook.ipynb : final cleaned notebook of the work. (EDA, Modeling, Export CSV to Kaggle submission)

  • pred.csv : Kaggle submitted csv file

  • README.md : contains executive summary and instructions.

  • (GA-DSI)Instruction.md : given instruction for the team challenge from General Assembly, Data Science Immersive course. (cohort: GA-DSI-US-4)

  • DESCR.md : descriptions of variables, original source from the kaggle competition page.

  • /assets : folder contains datasets(train, test, weather, spray data)

  • /archive : archive of the files and notebooks during the team work.

About

A team repo to apply Kaggle Challenge. (GA-DSI)

License:MIT License


Languages

Language:Jupyter Notebook 75.8%Language:HTML 24.1%Language:R 0.0%Language:Python 0.0%