arektarnawski / lab-customer-analysis-round-4

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

logo_ironhack_blue 7

Lab | Customer Analysis Round 4

In today's lesson we talked about continuous distributions (mainly normal distribution), linear regression and how multicollinearity can impact the model. In this lab, we will test your knowledge on those things using the marketing_customer_analysis.csv file. You have been using the same data in the previous labs (round 2 and 3). You can continue using the same jupyter file. The file can be found in the files_for_lab folder.

Get the data

Use the jupyter file from the last lab (Customer Analysis Round 3)

Complete the following task

  • Check the data types of the columns. Get the numeric data into dataframe called numerical and categorical columns in a dataframe called categoricals. (You can use np.number and np.object to select the numerical data types and categorical data types respectively)
  • Now we will try to check the normality of the numerical variables visually
    • Use seaborn library to construct distribution plots for the numerical variables
    • Use Matplotlib to construct histograms
    • Do the distributions for different numerical variables look like a normal distribution
  • For the numerical variables, check the multicollinearity between the features. Please note that we will use the column total_claim_amount later as the target variable.
  • Drop one of the two features that show a high correlation between them (greater than 0.9). Write code for both the correlation matrix and for seaborn heatmap. If there is no pair of features that have a high correlation, then do not drop any features

About


Languages

Language:Jupyter Notebook 100.0%