A selection of machine learning resources
Uses
- Additionally Vincent Warmerdam sees the possibility to use an auto encoder and cluster the latent space. This gets outlyer detection and sampling in the latent space for free
Resource | Comment |
---|---|
Youtube: 4 Basic Types of Cluster Analysis used in Data Analytics (Decisive Data) | TBD |
Resource | Comment |
---|---|
Youtube: Assessing the quality of a clustering (Christian Hennig @ PyData) | TBD |
Resource | Comment |
---|---|
Youtube: K Means Clustering (Siraj Raval) | TBD |
Youtube: K Means Clustering (StatQuest) | TBD |
Youtube: K Means Clustering (Andrew Ng) | TBD |
Youtube: K Means Clustering (Luis Serrano) | Hierarchical Clustering too |
Youtube: K Means Clustering (Batool Arhamna Haider) | Within Cluster and Between Cluster Distances |
TowardsDataScience: Understanding K-Means | Also has K-Medoids |
Library | Comment |
---|---|
sklearn.cluster.KMeans | Standard K Means Clustering (Can initialize random or k-means++) |
sklearn.cluster.MiniBatchKMeans | Mini-Batch K-Means clustering |
- Can K Means handle categorical data?
- The k-Means algorithm is not applicable to categorical data, as categorical variables are discrete and do not have any natural origin. So computing euclidean distance for such as space is not meaningful.
- NOT CHECKED YET Can kmodes be used for it?
- Blog: Clustering mixed data
- Why can't we just scale to [0;1] or [-1;1] range?
- KModes youtube algorithm
- Video: Mixed mode and means clustering K-Prototypes
- Video: K Modes Intuition
Resource | Comment |
---|---|
Youtube: Hierarchical Clustering (Luis Serrano) | K Means Clustering too |
Youtube: Heriarchical Clustering (StatQuest) | TBD |
DBSCAN has typically two parameters: eps and minPoints
- eps tells how close data should be to be within the cluster
- minPoints says how many points are needed to form a cluster
Resource | Comment |
---|---|
Youtube: Density Based Clustering (Brian Kent @ PyData) | TBD |
Youtube: HDBSCAN (John Healy @ PyData) | TBD |
Library | Comment |
---|---|
sklearn.cluster.DBSCAN | TBD |
scikit-learn-contrib.hdbscan | Hierarchical density based clustering |
debacl | Uses Level Set Trees |
Clusters are described by Gaussian distributions
Resource | Comment |
---|---|
Hands on machine learning (Aurélien Geron): Classification notebook | MNIST, Precision, Recall, Confusion Matrix, ROC |