learn-co-students / dsc-distributions-section-recap-v2-1-seattle-ds-012720

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Statistical Distributions - Recap

Introduction

This short lesson summarizes the topics we covered in this section and why they'll be important to you as a data scientist.

Key Takeaways

In this section, we really dug into statistical distributions.

Key takeaways include:

  • There are two types of distributions - continuous, where (subject to measurement and/or storage precision) there are effectively an infinite number of possible values, and discrete, where there are a distinct, non-infinite number of options. For example, a person's height is continuous - assuming a suitably precise tape measure - whereas the number of bedrooms in a house is discrete
  • How to describe the distribution of data sets using Probability Mass Functions, Cumulative Distribution Functions, and Probability Density Functions
  • One type of discrete distribution deals with a series of boolean events or trials - often called Bernoulli Trials
  • A Normal distribution is the classic "bell curve" with 68% of the probability mass within 1 SD of the mean, 95% within 2 SDs and 99.7% within 3 SDs
  • Differences between the normal and the standard normal distribution
  • The uses of $z$-scores and p-values for describing a distribution
  • How a one sample $z$-test is a very simple form of hypothesis testing.
  • How skewness and kurtosis can be used to measure how different a given distribution is from a normal distribution

In the Appendix to this Module, you'll have the opportunity to learn about:

  • the uniform distribution, which represents processes where each outcome is equally likely, like rolling a dice
  • the Poisson distribution, which can be used to display the likelihood of a given number of successes over a given time period
  • the exponential distribution, which can be used to describe the probability distribution of the amount of time it may take before a given event occurs

About

License:Other


Languages

Language:Jupyter Notebook 100.0%