nataschaberg / classification-lab-u3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Lab | Making predictions with logistic regression

In this lab, you will be using the Sakila database of movie rentals.

In order to optimize our inventory, we would like to know which films will be rented next month and we are asked to create a model to predict it.

Instructions

  1. Create a query or queries to extract the information you think may be relevant for building the prediction model. It should include some film features and some rental features.
  2. Read the data into a Pandas dataframe.
  3. Analyze extracted features and transform them. You may need to encode some categorical variables, or scale numerical variables.
  4. Create a query to get the list of films and a boolean indicating if it was rented last month. This would be our target variable.
  5. Create a logistic regression model to predict this variable from the cleaned data.
  6. Evaluate the results.



Lab | Imbalanced data

We will be using the files_for_lab/customer_churn.csv dataset to build a churn predictor.

Instructions

  1. Load the dataset and explore the variables.
  2. We will try to predict variable Churn using a logistic regression on variables tenure, SeniorCitizen,MonthlyCharges.
  3. Extract the target variable.
  4. Extract the independent variables and scale them.
  5. Build the logistic regression model.
  6. Evaluate the model.
  7. Even a simple model will give us more than 70% accuracy. Why?
  8. Synthetic Minority Oversampling TEchnique (SMOTE) is an over sampling technique based on nearest neighbors that adds new points between existing points. Apply imblearn.over_sampling.SMOTE to the dataset. Build and evaluate the logistic regression model. Is it there any improvement?
  9. Tomek links are pairs of very close instances, but of opposite classes. Removing the instances of the majority class of each pair increases the space between the two classes, facilitating the classification process. Apply imblearn.under_sampling.TomekLinks to the dataset. Build and evaluate the logistic regression model. Is it there any improvement?

About


Languages

Language:Jupyter Notebook 100.0%