isra-st / lab-cleaning-categorical-data

lab-cleaning-categorical-data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

logo_ironhack_blue 7

Lab | Cleaning categorical data

For this lab, we will be using the dataset in the Customer Analysis Business Case. This dataset can be found in files_for_lab folder. In this lab we will explore categorical data. You can also continue working on the same jupyter notebook from the previous lab. However that is not necessary.

Instructions

  1. Import the necessary libraries if you are starting a new notebook.
  2. Load the csv. Use the variable customer_df as customer_df = pd.read_csv().
  3. What should we do with the customer_id column?
  4. Load the continuous and discrete variables into numericals_df and categorical_df variables, for eg.:
    numerical_df = customer_df.select_dtypes()
    categorical_df = customer_df.select_dtypes()
  5. Plot every categorical variable. What can you see in the plots? Note that in the previous lab you used a bar plot to plot categorical data, with each unique category in the column on the x-axis and an appropriate measure on the y-axis. However, this time you will try a different plot. This time in each plot for the categorical variable you will have, each unique category in the column on the x-axis and the target(which is numerical) on the Y-axis
  6. For the categorical data, check if there is any data cleaning that need to perform. Hint: You can use the function value_counts() on each of the categorical columns and check the representation of different categories in each column. Discuss if this information might in some way be used for data cleaning.

About

lab-cleaning-categorical-data


Languages

Language:Jupyter Notebook 100.0%