Section Recap

Introduction

This short lesson summarizes the topics we covered in section 10 and why they'll be important to you as a data scientist.

Objectives

You will be able to:

Understand and explain what was covered in this section
Understand and explain why this section will help you become a data scientist

Key Takeaways

In this section, the nominal focus was on how to perform a linear regression, but the real value was learting how to think about the application of machine learning models to data sets. Key takeaways include:

The Pearson Correlation (range: -1 -> 1) is a standard way to describe the correlation between two variables
Statistical learning theory deals with the problem of finding a predictive function based on data
A loss function calculates how well a given model represents the relationship between data values
A linear regression is simply a (straight) line of best fit for predicting a continuous value (y = mx + c)
The Coefficient of Determination (R Squared) can be used to determine how well a given line fits a given data set
Certain assumptions must hold true for a least squares linear regression to be useful - linearity, normality and heteroscedasticity
Q-Q plots can check for normality in residual errors
The Jarque-Bera test can be used to test for normality - especially when the number of data points is large
The Goldfeld-Quant test can be used to check for homoscedasticity

yishuen / dsc-1-10-18-section-recap-summary-nyc-ds-career-031119

Section Recap

Introduction

Objectives

Key Takeaways

About

Languages