thanhtcptit / Statistics-with-Python-Specialization

Resources of the Statistics with Python Specialization on Coursera

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Syllabus

Course 1: Understanding and Visualizing Data with Python

In this course, learners will be introduced to the field of statistics, including where data come from, study design, data management, and exploring and visualizing data. Learners will identify different types of data, and learn how to visualize, analyze, and interpret summaries for both univariate and multivariate data. Learners will also be introduced to the differences between probability and non-probability sampling from larger populations, the idea of how sample estimates vary, and how inferences can be made about larger populations based on probability sampling.

At the end of each week, learners will apply the statistical concepts they’ve learned using Python within the course environment. During these lab-based sessions, learners will discover the different uses of Python as a tool, including the Numpy, Pandas, Statsmodels, Matplotlib, and Seaborn libraries. Tutorial videos are provided to walk learners through the creation of visualizations and data management, all within Python. This course utilizes the Jupyter Notebook environment within Coursera.

Week 1: INTRODUCTION TO DATA

In the first week of the course, we will review a course outline and discover the various concepts and objectives to be mastered in the weeks to come. You will get an introduction to the field of statistics and explore a variety of perspectives the field has to offer. We will identify numerous types of data that exist and observe where they can be found in everyday life. You will delve into basic Python functionality, along with an introduction to Jupyter Notebook. All of the course information on grading, prerequisites, and expectations are on the course syllabus and you can find more information on our Course Resources page.

Learning Objectives

  • Develop an outlook for the course and summarize future concepts and objectives
  • Explore various uses of statistics and examine where data originates from
  • Properly identify various data types and understand the different uses for each
  • Understand the basic functions of Python to import, clean, and manage data

Week 2: UNIVARIATE DATA

In the second week of this course, we will be looking at graphical and numerical interpretations for one variable (univariate data). In particular, we will be creating and analyzing histograms, box plots, and numerical summaries of our data in order to give a basis of analysis for quantitative data and bar charts and pie charts for categorical data. A few key interpretations will be made about our numerical summaries such as mean, IQR, and standard deviation. An assessment is included at the end of the week concerning numerical summaries and interpretations of these summaries.

Learning Objectives

  • Understand the various graphical displays used for univariate categorical and quantitative data
  • Interpret histograms and boxplots to describe quantitative data
  • Obtain key interpretations used for describing quantitative data
  • Create histograms, box plots, and numerical summaries through Python

Week 3: MULTIVARIATE DATA

In the third week of this course on looking at data, we’ll introduce key ideas for examining research questions that require looking at more than one variable. In particular, we will consider both numerically and visually how different variables interact, how summaries can appear deceiving if you don’t properly account for interactions, and differences between quantitative and categorical variables. This week’s assignment will consist of a writing assignment along with reviewing those of your peers.

Learning Objectives

  • Create graphs and summary statistics of multivariate data, both categorical and quantitative
  • Summarize important information obtained through visualizations of multivariate data
  • Communicate statistical ideas clearly and concisely to a broad audience
  • Integrate statistical reasoning into decisions and situations in your daily life

Week 4: POPULATIONS AND SAMPLES

In this week, you’ll spend more time thinking about where data come from. The highest-quality statistical analyses of data will always incorporate information about the process used to generate the data, or features of the data collection design. You’ll be exposed to important concepts related to sampling from larger populations, including probability and non-probability sampling, and how we can make inferences about larger populations based on well-designed samples. You’ll also learn about the concept of a sampling distribution, and how estimation of the variance of that distribution plays a critical role in making statements about populations. Finally, you’ll learn about the importance of reading the documentation for a given data set; a key step in looking at data is also looking at the available documentation for that data set, which describes how the data were generated.

Learning Objectives

  • Distinguish between probability and non-probability sampling
  • Describe the concept of a sampling distribution, and how one can make inference about a population parameter based on the estimated features of that distribution
  • Identify appropriate analytic techniques for probability and non-probability samples
  • Explain how poorly-designed samples can lead to descriptions of population features that are biased in nature

Course 2: Inferential Statistical Analysis with Python

In this course, we will explore basic principles behind using data for estimation and for assessing theories. We will analyze both categorical data and quantitative data, starting with one population techniques and expanding to handle comparisons of two populations. We will learn how to construct confidence intervals. We will also use sample data to assess whether or not a theory about the value of a parameter is consistent with the data. A major focus will be on interpreting inferential results appropriately.

At the end of each week, learners will apply what they’ve learned using Python within the course environment. During these lab-based sessions, learners will work through tutorials focusing on specific case studies to help solidify the week’s statistical concepts, which will include further deep dives into Python libraries including Statsmodels, Pandas, and Seaborn. This course utilizes the Jupyter Notebook environment within Coursera.

Week 1: OVERVIEW & INFERENCE PROCEDURES

In this first week, we’ll review the course syllabus and discover the various concepts and objectives to be mastered in weeks to come. You’ll be introduced to inference methods and some of the research questions we’ll discuss in the course, as well as an overall framework for making decisions using data, considerations for how you make those decisions, and evaluating errors that you may have made. On the Python side, we’ll review some high level concepts from the first course in this series, Python’s statistics landscape, and walk through intermediate level Python concepts. All of the course information on grading, prerequisites, and expectations are on the course syllabus and you can find more information on our Course Resources page.

Learning Objectives

  • Develop an outlook for the course and summarize future concepts and objectives
  • Explain the framework for making decisions using data along with the potential consequences of those decisions
  • Identify the basic concepts central to Bayesian and frequentist statistics, which will be used throughout this course
  • Write basic Python functions and interpret documentation

Week 2: CONFIDENCE INTERVALS

In this second week, we will learn about estimating population parameters via confidence intervals. You will be introduced to five different types of population parameters, assumptions needed to calculate a confidence interval for each of these five parameters, and how to calculate confidence intervals. Quizzes will appear throughout the week to test your understanding. In addition, you’ll learn how to create confidence intervals in Python.

Learning Objectives

  • Define a confidence interval
  • Determine assumptions needed to calculate confidence intervals for their respective population parameter
  • Calculate confidence intervals by hand for one population proportion, difference in two population proportions, one population mean, one population mean difference for paired data, and difference in population means for independent groups
  • Demonstrate your understanding of confidence intervals by communicating statistical ideas clearly and concisely for a potential client
  • Create confidence intervals in Python

Week 3: HYPOTHESIS TESTING

In week three, we’ll learn how to test various hypotheses - using the five different analysis methods covered in the previous week. We’ll discuss the importance of various factors and assumptions with hypothesis testing and learn to interpret our results. We will also review how to distinguish which procedure is appropriate for the research question at hand. Quizzes and a peer assessment will appear throughout the week to test your understanding.

Learning Objectives

  • Differentiate between various scenarios and determine the appropriate analysis method
  • Apply techniques of hypothesis testing and interpret the results
  • Run hypothesis tests in Python and interpret the output.

Week 4: LEARNER APPLICATION

In the final week of this course, we will walk through several examples and case studies that illustrate applications of the inferential procedures discussed in prior weeks. Learners will see examples of well-formulated research questions related to the study designs and data sets that we have discussed thus far, and via both confidence interval estimation and formal hypothesis testing, we will formulate inferential responses to those questions.

Learning Objectives

  • Understand how the inferential procedures introduced in earlier weeks are married to well-formulated research questions
  • Review how the inferential procedures are applied and interpreted step by step when analyzing real data
  • Understand how accounting for complex sample designs can affect the inferential procedures discussed

Course 3: Fitting Statistical Models to Data with Python

In this course, we will expand our exploration of statistical inference techniques by focusing on the science and art of fitting statistical models to data. We will build on the concepts presented in the Statistical Inference course (Course 2) to emphasize the importance of connecting research questions to our data analysis methods. We will also focus on various modeling objectives, including making inference about relationships between variables and generating predictions for future observations.

This course will introduce and explore various statistical modeling techniques, including linear regression, logistic regression, generalized linear models, hierarchical and mixed effects (or multilevel) models, and Bayesian inference techniques. All techniques will be illustrated using a variety of real data sets, and the course will emphasize different modeling approaches for different types of data sets, depending on the study design underlying the data (referring back to Course 1, Understanding and Visualizing Data with Python).

During these lab-based sessions, learners will work through tutorials focusing on specific case studies to help solidify the week’s statistical concepts, which will include further deep dives into Python libraries including Statsmodels, Pandas, and Seaborn. This course utilizes the Jupyter Notebook environment within Coursera.

Week 1: OVERVIEW & CONSIDERATIONS FOR STATISTICAL MODELING

We begin this third course of the Statistics with Python specialization with an overview of what is meant by “fitting statistical models to data.” In this first week, we will introduce key model fitting concepts, including the distinction between dependent and independent variables, how to account for study designs when fitting models, assessing the quality of model fit, exploring how different types of variables are handled in statistical modeling, and clearly defining the objectives of fitting models.

Learning Objectives

  • Introduce statistics from a modeling perspective and understand what it means to fit models to data.
  • Examine models in the context of fit and uncertainty to evaluate their predictive power.
  • Distinguish between different types of variables and their roles when specifying statistical models.
  • Examine how study design features are accounted for when fitting models.
  • Describe the different inferential objectives that researchers may have when fitting models.

Week 2: FITTING MODELS TO INDEPENDENT DATA

In this second week, we’ll introduce you to the basics of two types of regression: linear regression and logistic regression. You’ll get the chance to think about how to fit models, how to assess how well those models fit, and to consider how to interpret those models in the context of the data. You’ll also learn how to implement those models within Python.

Learning Objectives

  • Describe the general ideas of regression, including the relationship between variables and predicting results for new observations.
  • Recognize problem setups and research questions as an appropriate regression framework, either linear and logistic.
  • Interpret output, especially slopes, from the regression models to understand more about the relationship between variables.
  • Assess model fit to determine how appropriate the models might be and limitations to relying on the models.

Week 3: FITTING MODELS TO DEPENDENT DATA

In the third week of this course, we will be building upon the modeling concepts discussed in Week 2. Multilevel and marginal models will be our main topic of discussion, as these models enable researchers to account for dependencies in variables of interest introduced by study designs. We’ll be covering why and when we fit these alternative models, likelihood ratio tests, as well as fixed effects and their interpretations.

Learning Objectives

  • Explore methods for modeling data with correlated observations.
  • Gain insight into the methods of multilevel and marginal modeling, why we may use them, and when to use one method versus another.
  • Dive deeper into likelihood ratio tests for determining statistically significant differences between models that do and do not account for correlational structures.
  • Compare and contrast inferences that can be made when fitting multilevel and marginal models.

Week 4: Special Topics

In this final week, we introduce special topics that extend the curriculum from previous weeks and courses further. We will cover a broad range of topics such as various types of dependent variables, exploring sampling methods and whether or not to use survey weights when fitting models, and in-depth case studies utilizing Bayesian techniques to derive insights from data. You’ll also have the opportunity to apply Bayesian techniques in Python.

Learning Objectives

  • Gain exposure to working with other types of outcomes that are common in certain fields such as ordinal, censored, and multinomial variables.
  • Introduce advanced modeling methods that are not a part of the common statistics curriculum.
  • Introduce Bayesian statistics and provide resources for learners to begin exploring methods and methodologies unique to Bayesian methods.

Certificate

Certificate

About

Resources of the Statistics with Python Specialization on Coursera


Languages

Language:Jupyter Notebook 100.0%