yishuen / dsc-1-10-18-section-recap-summary-nyc-ds-career-031119

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Section Recap

Introduction

This short lesson summarizes the topics we covered in section 10 and why they'll be important to you as a data scientist.

Objectives

You will be able to:

  • Understand and explain what was covered in this section
  • Understand and explain why this section will help you become a data scientist

Key Takeaways

In this section, the nominal focus was on how to perform a linear regression, but the real value was learting how to think about the application of machine learning models to data sets. Key takeaways include:

  • The Pearson Correlation (range: -1 -> 1) is a standard way to describe the correlation between two variables
  • Statistical learning theory deals with the problem of finding a predictive function based on data
  • A loss function calculates how well a given model represents the relationship between data values
  • A linear regression is simply a (straight) line of best fit for predicting a continuous value (y = mx + c)
  • The Coefficient of Determination (R Squared) can be used to determine how well a given line fits a given data set
  • Certain assumptions must hold true for a least squares linear regression to be useful - linearity, normality and heteroscedasticity
  • Q-Q plots can check for normality in residual errors
  • The Jarque-Bera test can be used to test for normality - especially when the number of data points is large
  • The Goldfeld-Quant test can be used to check for homoscedasticity

About

License:Other


Languages

Language:Jupyter Notebook 100.0%