HariShanmugavelu / MachineLearning_AppliedStatistics

Applied Statistics used for Machine Learning problems

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MachineLearning_AppliedStatistics

Imported the necessary libraries

Read the data as a data frame

Performed basic EDA which included the following and printed out the insights at every step.

a. Shape of the data

b. Data type of each attribute

c. Checking the presence of missing values

d. 5 point summary of numerical attributes

e. Distribution of ‘bmi’, ‘age’ and ‘charges’ columns.

f. Measure of skewness of ‘bmi’, ‘age’ and ‘charges’ columns

g. Checking the presence of outliers in ‘bmi’, ‘age’ and ‘charges columns

h. Distribution of categorical columns (include children)

i. Pair plot that includes all the columns of the data frame

The notebook also analyzed the below questions with the statistical evidence

a. Do charges of people who smoke differ significantly from the people who don't?

b. Does bmi of males differ significantly from that of females?

c. Is the proportion of smokers significantly different in different genders?

d. Is the distribution of bmi across women with no children, one child and two children,the same ?

About

Applied Statistics used for Machine Learning problems


Languages

Language:Jupyter Notebook 100.0%