Amuyen / lab-imbalanced-data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

logo_ironhack_blue 7

Lab | Imbalanced data

We will be using the files_for_lab/customer_churn.csv dataset to build a churn predictor.

Instructions

  1. Load the dataset and explore the variables.
  2. We will try to predict variable Churn using a logistic regression on variables tenure, SeniorCitizen,MonthlyCharges.
  3. Split the Dataset into X ('tenure', 'SeniorCitizen', 'MonthlyCharges') and y ('Churn')
  4. Build the logistic regression model.
  5. Evaluate the model.
  6. Even a simple model will give us more than 70% accuracy. Why?
  7. Synthetic Minority Oversampling TEchnique (SMOTE) is an over sampling technique based on nearest neighbors that adds new points between existing points. Apply imblearn.over_sampling.SMOTE to the dataset. Build and evaluate the logistic regression model. Is it there any improvement?

About


Languages

Language:Jupyter Notebook 100.0%