zdelrosario / data-science-curriculum

Home Page:https://zdelrosario.github.io/data-science-curriculum/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Build Build book and deploy status

data-science-curriculum

This is a curriculum of open-source data science exercises, intended to take a student from zero coding experience to basic data science literacy. These exercises are heavily inspired by the (discontinued) Data Challenge Lab at Stanford University and rely on the Tidyverse.

Please see our JOSE paper for more info.

How to Use This Repo

  1. (Setup) Complete this exercise to install RStudio.
  2. (Setup) Download and unzip this archive to obtain the curriculum materials.
  3. (Setup) Open the folder you unzipped as a project in RStudio (File > Open Project...).
  4. (Learn) Work through the exercise files in the exercises_sequenced/ folder at your own pace to learn data science skills.
  5. (Learn) Use the challenges in the challenges/ folder to put your new skills to use.

Suggested order: The exercises filenames start with a numerical dXY prefix to denote their suggested day-order. This is provided to interleave topics and provide about an hour of work per day. I recommend working 5 days a week on the exercises and taking weekends off!

Table of Contents

  • Curriculum contains the desired learning outcomes of this material
  • Exercises contains the exercises, which provide a first introduction to using the Tidyverse to do Data Science
  • Challenges contains more open-ended data challenges, which will test and build upon your skills from the exercises
  • Content visualization script to help sequence course content and visualize topics
  • Sequencing script to help assign exercise and challenge due dates

Course Description

Data Science is a powerful toolkit to extract usable insights from data. In this class, you will learn tools and gain understanding. You will use software tools to liberate data from published images and tables, wrangle messy datasets into machine learning (ML)-ready form, fit and interpret ML models, and visualize to extract meaning. You will also speak the language of uncertainty---statistics---to avoid getting fooled by models. You will criticize published findings and ask what is, and what is not, in the data. Assignments will include regular practice exercises, progressively puzzling real-data challenges, and a final project of your choice where you obtain, wrangle, and understand a dataset.

Contributing

I welcome suggestions and contributions! If you want to contribute, please see Contributing.

About

https://zdelrosario.github.io/data-science-curriculum/index.html

License:Creative Commons Attribution Share Alike 4.0 International


Languages

Language:HTML 93.3%Language:TeX 2.8%Language:R 2.0%Language:Makefile 1.0%Language:Python 0.5%Language:Shell 0.4%