scikit-learn v0.24.1
Section | Title | Contents |
---|---|---|
00 | Getting Started | Estimators, Transformers, Preprocessors, Pipelines, Model Evaluation, Parameter Searches, Next Steps |
01 | Linear Models | OLS, LS Ridge Lasso Elastic-Net Least Angle Regression (LARS) LARS Lasso OMP <br. Bayes Generalized Linear Models (GLM) Tweedie Models Stochastic Gradient Descent (SGD) Perceptrons Passive-Aggressive Algos RANSAC, Huber, Thiel-Sen Polynomial Regression |
01a | Logistic Regression | Basics, Examples |
02 | Discriminant Analysis | LDA QDA Math Foundations, Shrinkage, Estimators |
03 | Kernel Ridge Regression | KRR vs SVR |
04 | Support Vector Machines (SVMs) | Classifiers (SVC, NuSVC, LinearSVC), Regressors (SVR, NuSVR, LinearSVR), Scoring, Weights, Complexity, Kernels |
05 | Stochastic Gradient Descent (SGD) | Classifier Classifier (Multiclass) Classifier (Weighted) Solvers Regressors Sparse Data; Complexity; Stopping/Convergence; Tips |
06 | K Nearest Neighbors (KNN) | Algos (Ball Tree, KD Tree, Brute Force) Radius-based Classifiers Radius-based Regressors Nearest Centroid Classifiers Caching Neighborhood Components Analysis (NCA) |
07 | Gaussian Processes (GPs) | GP Regressors |
08 | Cross Decomposition | Partial Least Squares (PLS) Canonical PLS SVD PLS PLS Regression Canonical Correlation Analysis (CCA) |
09 | Naive Bayes (NB) | Gaussian NB Multinomial NB Complement NB Bernoulli NB Categorical NB Out-of-core fitting |
10 | Decision Trees (DTs) | Classifiers Graphviz Regressions Multiple Outputs Extra Trees Complexity, Algorithms Gini, Entropy, Misclassification Minimal cost-complexity Pruning |
11a | Ensembles/Bagging | Methods Random Forests, Extra Trees Parameters, Parallel Execution, Feature Importance Random Tree Embedding |
11b | Ensembles/Boosting | AdaBoost Gradient Boosting (GBs) GB Classifiers GB Regressions Tree Sizes, Math (TODO), Loss Functions, Shrinkage, Subsampling, Feature Importance Histogram Gradient Boosting (HGB) HGB - Monotonic Constraints Stacked Generalization |
11c | Ensembles/Voting | Hard Voting, Soft Voting, Voting Regressor |
11d | Ensembles/General Stacking | Summary |
12 | Multiclass/Multioutput Problems | Label Binarization One vs Rest (OvR), One vs One (OvO) Classification Output Codes Multilabel, Multioutput Classification Classifier Chains Multioutput Regressions Regression Chains |
13 | Feature Selection (FS) | Removing Low-Variance Features Univariate FS |
14 | Semi-Supervised/Unsupervised Learning | Self-Training Classifier Label Propagation, Label Spreading |
15 | Isotonic Regression | Example |
16 | Calibration Curves | Intro/Example, Cross-Validation, Metrics Regressors |
17 | Perceptrons | Intro, Classification, Regression, Regularization, Training, Complexity, Tips |
21 | Gaussian Mixtures (GMs) | Expectation Maximization Variational Bayes GM |
22 | Manifolds | Isomap, Locally Linear Embedding (LLE), Modified LLE, Hessian LLE, Local Tangent Space Alignment (LTSA), Multidimensional Scaling (MDS) Random Trees Embedding, Spectral Embedding, t-SNE, Neighborhood Components Analysis (NCA) |
23 | Clustering | K-Means, Voronoi Diagrams Affinity Propagation Mean Shift Spectral Clustering Agglomerative Clustering, Dendrograms, Connectivity Constraints, Distance Metrics DBSCAN, Optics, Birch |
23a | Clustering Metrics | Rand Index, Mutual Info Score, Homogeneity, Completeness, V-Measure, Fowlkes-Mallows, Silhouette Coefficient, Calinski-Harabasz, Davies-Bouldin Contingency Matrix Pair Confusion Matrix |
24 | Biclustering | Spectral Co-Clustering, Spectral Bi-Clustering Metrics |
25 | Component Analysis / Matrix Factorization | PCA, Incremental PCA, PCA w/ Random SVD, PCA w/ Sparse Data, Kernel PCA Dimension Reduction Comparison Truncated SVD / LSA Dictionary Learning Factor Analysis Independent Component Analysis Non-Negative Matrix Factorization (NNMF) Latent Dirichlet Allocation (LDA) |
26 | Covariance | Empirical CV, Shrunk CV, Max Likelihood Estimation (MLE) Ledoit-Wolf Shrinkage, Oracle Approximating Shrinkage Sparse Inverse CV, aka Precision Matrix Mahalanobis Distance |
27 | Novelties & Outliers | One-Class SVMs, Elliptic Envelope, Isolation Forest, Local Outlier Factor |
28 | Density Estimation (DE) | Histograms, Kernel DE |
29 | Restricted Boltzmann Machines (RBMs) | Intro, Training |
31 | Cross Validation (CV) | Intro, Metrics Parameter Estimation, Pipelines, Prediction Plots, Nesting K-Fold, Stratified K-Fold Leave One Out, Leave P Out Class Label CV Grouped Data CV Predefined Splits Time Series Splits Permutation Testing Visualizations |
32 | Parameter Tuning | Grid Search, Randomized Optimization Successive Halving Composite Estimators & Parameter Spaces Alternative to Brute Force Info Criteria (AIC, BIC) |
33 | Metrics & Scoring (Intro) | scoring, make_scorer |
33a | Classification Metrics | Accuracy, Top-K Accuracy, Balanced Accuracy Cohen's Kappa Confusion Matrix Classification Report Hamming Loss Precision, Recall, F-Measure, Precision-Recall Curve, Average Precision Jaccard Similarity, Hinge Loss, Log Loss, Matthews Correlation Coefficient Receiver Operating Characteristic (ROC) Curves, ROC-AUC Detection Error Tradeoff (DET), Zero One Loss, Brier Score |
33b | Multilabel Ranking Metrics | Coverage Error, Label Ranking Avg Precision (LRAP), Label Ranking Loss Discounted Cumulative Gain (DCG), Normalized DCG |
33c | Regression Metrics | Explained Variance, Max Error, Mean Absolute Error (MAE), Mean Squared Error (MSE), Mean Squared Log Error (MSLE), Mean Absolute Pct Error (MAPE) R^2 score, aka Coefficient of Determination Tweedie Deviances |
33d | Dummy Metrics | Dummy Classifiers, Dummy Regressors |
34 | Validation Curves | Example, Validation Curve, Learning Curve |
41 | Viz/Inspection | 2D PDPs, 3D PDPs Individual Conditional Expectation (ICE) Plot |
42 | Viz/Permutations | Permutation Feature Importance (PFI) Impurity vs Permutation Metrics |
50a | Viz/ROC Curves | ROC Curve |
50b | Viz/custom PDP Plots | Example |
50c | Vis/Classification metrics | Confusion Matrix, ROC Curve, Precision-Recall Curve |
61 | Composite Transformers | Pipelines, Caching, Examples Regression Target xforms Feature Unions Column Transformers |
62a | Text Feature Extraction | Bag of Words (BoW) Sparsity, Count Vectorizer, Stop Words, Tf-Idf Binary Markers, Text file decoding, Hashing Trick Out-of-core Scaling, Custom Vectorizers |
62b | Image Patch Extraction | Extract from Patches, Reconstruct from Patches Connectivity Graphs |
63 | Data Preprocessing | Scaling, Quantile Transforms, Power Maps (Box-Cox, Yeo-Johnson) Category Coding, One-Hot Coding Quantization aka Binning Feature Binarization |
64 | Missing Value Imputation | Univariate, Multivariate, Multiple-vs-Single, Nearest-Neighbors Marking Imputed Values |
66 | Dimensionality Reduction/Random Projections | Johnson-Lindenstrauss lemma Gaussian RP, Sparse RP, Empirical Validation |
67 | Kernel Approximations | Nystroem RBF Sampler, Additive Chi-Squared Sampler, Skewed Chi-Squared Sampler Polynomial Sampling - Tensor Sketch |
68 | Pairwise Ops | Distances vs Kernels Cosine Similarity, Linear / Polynomial / Sigmoid / RBF / Laplacian / Chi-Squared kernels |
69 | Transforming Prediction Targets | Label Binarization, Multilabel Binarization, Label Encoding |
71 | Example Datasets | Boston, Iris, Diabetes, Digits, Linnerud, Wine, Breast Cancer Olivetti faces, 20 newsgroups, Labeled faces, Forest covertypes, Reuters corpus, KDD, Cal housing |
73 | Artificial Data | random-nclass-data, Gaussian blobs, Gaussian quantiles, Circles, Moons, Multilabel class data, Hastie data BiClusters, Checkerboards Regression, Friedman1/2/3 S-Curve, Swiss Roll Low-Rank Matrix, Sparse Coded Signal, Sparse Symmetric Positive Definite (SPD) Matrix |
74 | Other Data | Sample images, SVMlight/LibSVM formats, OpenML pandas.io, scipy.io, numpy.routines.io, scikit-image, imageio, scipy.io.wavfile |
81 | Scaling | Out-of-core ops (BUG = TODO) |
82 | Latency | Bulk-vs-atomic ops, Latency vs Validation, Latency vs #Features, Latency vs Datatype, Latency vs Feature Extraction Linear Algebra Libs (BLAS, LAPACK, ATLAS, OpenBLAS, MKL, vecLib) |
83 | Parallelism | JobLib, OpenMP, NumPy Oversubscription config switches |
90 | Persistence | Pickle, Joblib |