whythawk / data-as-a-science

Lesson guide and textbook for "Data as a Science" course.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Module 2 - Lesson 3: Reflective equilibrium, and methods for multiple regression

turukawa opened this issue · comments

ETHICS

Judge ethical responses – and states of reflective equilibrium – when justifying causing harm based on analysis.

John Rawls’ “Reflective equilibrium” and the trolley problems of “tractor” and “tumble”. Effectively: what happens when your choice enables a third-party to kill a person?
Example: Microsoft and US Supreme Court re data privacy in Ireland; Yahoo and Chinese Journalists; Can you kill a clinically dead (but still “alive”) person for their organs?

CURATION

Assess whether data – which in the wrong hands could cause harm – should have a purge function.

Dead man’s switches and data; if a dataset could be used to cause harm should there be a process to auto-delete if certain conditions occur? How do you manage backups in such a case (data-at-rest)?
Example: New York register of undocumented immigrants under Donald Trump; algorithms that ID gay people.

ANALYSIS

Prepare multiple regressions and inference for true slopes.

Includes adjusted R2 and bootstrapping scatter plots, plus null hypotheses of true slopes.

PRESENTATION

Ensure ambiguity in data, and risks in outcomes, are reflected in visualisation and results.

“as scientists aggregate and summarise large heterogenous data, risks of ambiguity rise”, cf Anscombe’s Quartet;
Error bars, and uncertainty in visualisation.


CASE STUDY

Multiple predictors for a single condition, and identifying which is the priority; recognising uncertainty and potential for wrong policy change.
National Cancer Institute NIH/GDC Genomic Data Commons Leukaemia dataset