ignacioriveros1 / mlforpublicpolicylab

Repo for ML for Public Policy Lab course at CMU

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

10718, 94889: Machine Learning for Public Policy Lab

Previous Versions: Spring 2020

Fall 2020: Tues & Thurs, 3:20-4:40pm, Remote

This is a project-based course designed to provide students training and experience in solving real-world problems using machine learning, with a focus on problems from public policy and social good.

Through lectures, discussions, readings, and project assignments, students will learn about and experience building end-to-end machine learning systems, starting from project definition and scoping, through modeling, to field validation and turning their analysis into action. Through the course, students will develop skills in problem formulation, working with messy data, communicating about machine learning with non-technical stakeholders, model interpretability, understanding and mitigating algorithmic bias & disparities, and evaluating the impact of deployed models.

Pre-Requisites: Students will be expected to know Python, and have prior coursework in machine learning. This course assumes that you have taken Machine Learning courses before and is focused on teaching how to use ML to solve real-world problems. Experience with SQL, *nix command line, git(hub), and working on remote machines will be helpful.

DRAFT SYLLABUS

People

Instructors

Rayid Ghani Kit Rodolfa

GHC 8023
Office Hours:
TBD

GHC 8018
Office Hours: TBD

Teaching Assistants

Amartya Basu Aaron Dunmore TBD
Office Hours: TBD Office Hours: TBD Office Hours: TBD

Tentative Schedule

See the draft syllabus for much more detail as well, including information about group projects, grading, and helpful optional readings.

Week Dates          Holidays? Lecture/Discussion Topic                   Project Activity                   Goal                                     Required Readings                            Deliverable / Expected Output                  
1 Tu: Sep 1
Th: Sep 3
Tu: Intro/Overview + Project Overviews
Th: Scoping, Problem Definition, Balancing goals (equity, efficiency, effectiveness)
Intro/Overview Get familiar with the class, goals, and understand project choices Thursday:
Data Science Project Scoping Guide
Using Machine Learning to Assess the Risk of and Prevent Water Main Breaks
2 Tu: Sep 8
Th: Sep 10
Tu: Case Studies + Discussion
Th: Acquiring Data, Privacy, Record Linkage
Project Definition & Data Discovery Data Audit and Exploration

TA Sessions: SQL, Databases, github
Tuesday:
Fine-grained dengue forecasting using telephone triage services
Predictive Modeling for Public Health: Preventing Childhood Lead Poisoning
What Happens When an Algorithm Cuts Your Health Care
Beginning of week, team and project assignments
3 Tu: Sep 15
Th: Sep 17
Tu: Data Exploration
Th: Building ML Pipelines
Finalize Project Scope and Data Stories Tuesday:
• TBD reading on data exploration
Practical Statistics for Data Scientists, Chapter 1

Thursday:
Architecting a Machine Learning Pipeline
ETL of some dataset (census?)
Data exploration
Scope refinement
4 Tu: Sep 22
Th: Sep 24
Analytical Formulation / Baselines Initial Data Science Pipeline Setup and Mockups
(problem formulation and validation process)
Tuesday:
Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations
Always Start with a Stupid Model, No Exceptions
First week of deep dives
Project Scope + Proposal with Descriptive Statistics
5 Tu: Sep 29
Th: Oct 1
Feature Engineering / Imputation Code Pipeline Development Iteration 1 - Build End to End Code Pipeline
(Focus on end-to-end shell)
Tuesday:
• TBD Feature Development Case Study
Missing Data Conundrum
Skeleton Code (Pipeline), Mockups
Proposal Peer Reviews
6 Tu: Oct 6
Th: Oct 8
Performance Metrics / Evaluation Pt. I (splits, metrics) Tuesday:
Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure
The Secrets of Machine Learning
Technical Modeling Plan (features, label definition(s), model specifications, etc)
7 Tu: Oct 13
Th: Oct 15
(Feb 24 drop deadline) Performance Metrics / Evaluation Pt. II (audition) Iteration 2 - End to End Code Pipeline
(Focus on feature development)
Tuesday:
Evaluating and Comparing Classifiers
Transductive Optimization of Top k Precision
Code (Pipeline), Initial Models (and analysis)
8 Tu: Oct 20
Th: Oct 22
Overfitting, Leakage, Issues in Deployment Tuesday:
Three Pitfalls to Avoid in Machine Learning
Leakage in Data Mining
Why is Machine Learning Deployment Hard?
Early Results: Correct but Crappy
9 Tu: Oct 27
Th: Oct 29
(prev wk spring brk) Model Interpretability Pt. I: global + postmodeling Iteration 3 - End to End Code Pipeline
(Focus on evaluation, results and intial front-end demo)
Tuesday:
• Interpretable Classification Models for Recidivism Prediction
Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission
Refined Feature List
10 Tu: Nov 3
Th: Nov 5
Model Interpretability Pt. II: local Tuesday:
Why Should I Trust You? Explaining the Predictions of any Classifier
Model Agnostic Supervised Local Explanations
Explainable machine-learning predictions for the prevention of hypoxaemia during surgery
Model Interpretation
11 Tu: Nov 10
Th: Nov 12
Bias and Fairness Pt I Tuesday:
Fairness Definitions Explained
A Theory of Justice, pages 1-19
Racial Equity in Algorithmic Criminal Justice [Focus on sections: I.B.2, all of section II, III introduction, III.B, and III.D.3]
Results (across models, features, metrics)
Add bias analysis methods
12 Tu: Nov 17
Th:Nov 19
Bias and Fairness Pt II Model selection, evaluation, balancing efficiency and equity Final model choice and understanding its performance and impact on disparities Tuesday:
A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions
Equality of Opportunity in Supervised Learning
Classification with fairness constraints: A meta-algorithm with provable guarantees
Draft Research Proposal Section
13 Tu: Nov 24
Th: Thanksgiving
Thanksgiving Causality and Field Validation Tuesday:
The seven tools of causal inference, with reflections on machine learning
• TBD Field Trial Case Study
No deep dive - Thursday off
14 Tu: Dec 1
Th: Dec 3
Analysis to Action, Accountability and Transparency Communications & Transition Planning Project Report and Presentations
Field Trial Design
Tuesday:
Ethics and Data Science, entire book
• Communicating Data with Tableau, Chapter 1
Teaching Statistics: A Bag of Tricks, Chapter 11
Last week of deep dives
Draft Field Trial Design Section
15 Tu: Dec 8
Th: Dec 10
Final Presentations Presentations Presentation
16 Dec 14 (Finals Wk) Final Report Due Final Report Report and Repo and Code Documentation

About

Repo for ML for Public Policy Lab course at CMU

License:MIT License