Reproducing examples from the "The Elements of Statistical Learning" by Trevor Hastie, Robert Tibshirani and Jerome Friedman with Python and its popular libraries: numpy, math, scipy, sklearn, pandas, tensorflow, statsmodels, sympy, catboost, pyearth, mlxtend. Almost all plotting is done using matplotlib, sometimes using seaborn.
The documented Jupyter Notebooks are in the examples folder:
Classifying the points from a mixture of "gaussians" using linear regression, nearest-neighbor, logistic regression with natural cubic splines basis expansion, neural networks, support vector machines, flexible discriminant analysis over MARS regression, mixture discriminant analysis, k-Means clustering, Gaussian mixture model and random forests.
Predicting prostate specific antigen using ordinary least squares, ridge/lasso regularized linear regression, principal components regression, partial least squares and best subset regression. Model parameters are selected by K-folds cross-validation.
Understanding the risk factors using logistic regression, L1 regularized logistic regression, natural cubic splines basis expansion for nonlinearities, thin-plate spline for mutual dependency, local logistic regression, kernel density estimation and gaussian mixture models.
Vowel speech recognition using regression of an indicator matrix, linear/quadratic/regularized/reduced-rank discriminant analysis and logistic regression.
Comparing patterns of bone mineral density relative change for men and women using smoothing splines.
Phonemes speech recognition using reduced flexibility logistic regression.
Analysing radial velocity of galaxy NGC7531 using local regression in multidimentional space.
Analysing the factors influencing ozone concentration using local regression and trellis plot.
Detecting email spam using logistic regression, generalized additive logistic model, decision tree, multivariate adaptive regression splines, boosting and random forest.
Analysing the factors influencing California houses prices using boosting over decision trees and partial dependance plots.
Predicting shopping mall customers occupation, and hence identifying demographic variables that discriminate between different occupational categories using boosting and market basket analysis.
Recognizing small hand-drawn digits using LeCun's Net-1 - Net-5 neural networks.
Analysing of the number three variation in ZIP codes using principal component and archetypal analysis.
Analysing microarray data using K-means clustring and hierarchical clustering.
Analysing country dissimilarities using K-medoids clustering and multidimensional scaling.
Analysing signature shapes using Procrustes transformation.
Recognizing wave classes using linear, quadratic, flexible (over MARS regression), mixture discriminant analysis and decision trees.