ashleypng / stat-ml-edu

Resources for education in statistics and machine learning: from advanced undergraduate to research level

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Statistics and machine learning: from undergraduate to research

by Edgar Dobriban, Associate Prof. of Statistics & Data Science, Wharton; w/ Secondary Appointment in Computer and Information Science, Univ. of Pennsylvania

  • This repository contains links to references (books, courses, etc) that are useful for learning statistics and machine learning (as well as some neighboring topics). References for background materials such as linear algebra, calculus/analysis/measure theory, probability theory, etc, are usually not included.

  • The level of the references starts from advanced undergraduate stats/math/CS and in some cases goes up to the research level. The books are often standard references and textbooks, used at leading institutions. In particular, several of the books are used in the standard curriculum of the PhD program in Statistics at Stanford University (where I learned from them as well), as well as at the University of Pennsylvania (where I work). It is hoped that the list benefits students, researchers seeking to enter new areas, and lifelong learners.

  • The list is highly subjective and incomplete, reflecting my own preferences, interests and biases. For instance, there is an emphasis on theoretical material. Most of the references included here are something that I have at least partially (and sometimes extensively) studied; and found helpful. Others are on my to-read list. Several topics are omitted due to lack of expertise (e.g., causal inference, Bayesian statistics, time series, sequential decision-making, functional data analysis, biostatistics, ...).

  • The links are to freely available author copies if those are available, or to online marketplaces otherwise (you are encouraged to search for the best price).

  • How to use these materials to learn: To be an efficient researcher, certain core material must be mastered. However, there is so much specialized knowledge that it can be overwhelming to know it all. Fortunately, it is often enough to know what type of results/methods/tools are available, and where to find them. Then, at any point during a research project when they are needed, they can be recalled and used.

  • Please feel free to contact me with suggestions.

Statistics

Principles and overview

Statistical Methodology

Statistical Theory

Core Theory: First Year PhD Curriculum

Advanced Theory

This section is the most detailed one, as it is the closest to my research.

Non-parametrics, minimax lower bounds

  • Tsybakov: Introduction to Nonparametric Estimation - The first two chapters contain many core results and techniques in nonparametric estimation, including lower bounds (Le Cam, Fano, Assouad).
  • Weissman, Ozgur, Han: Stanford EE 378 Course Materials. Lecture Notes - Possibly the most comprehensive set of materials on information theoretic lower bounds, including estimation and testing (Ingster's method) with examples given in high-dimensional problems, optimization, etc.
  • Johnstone: Gaussian estimation: Sequence and wavelet models - Beautiful overview of estimation in Gaussian noise (shrinkage, wavelet thresholding, optimality). Rigorous and deep, has challenging exercises.

Overviews of statistical machine learning theory

Semiparametrics

Multivariate statistical analysis

Subsampling

Empirical processes

High dimensional (mean field, proportional limit) asymptotics; random matrix theory (RMT) for stats+ML

Machine Learning

ML Theory

Deep Learning

DL Practice

DL Theory

This is subject to active development and research. There is no complete reference.

Uncertainty quantification

Complements

Optimization

Probability

Concentration inequalities

Chaining

About

Resources for education in statistics and machine learning: from advanced undergraduate to research level