The goal of this codebase is to implement K-Means algorithm and also to construct the phylogenetic tree(dendrogram) of the amino acids dataset using Agglomerative(bottom-up) and Divisive(top-down) hierarchical clustering. For Agglomerative clustering, we use at least three linkages from single, complete, average, ward, centroid etc. What we achieve: ● Compare Agglomerative and Divisive method on the dataset and plot their phylogenetic trees (dendrograms). ● Cluster them using K-Means algorithm. (Chosen K = 5 and experimented with multiple runs with different mean points). ● After obtaining the clusters from K-Means, we compare them with clusters produced by the hierarchical clustering techniques(for all the linkages chosen) and highlight the differences for the particular K value.
The goal of this codebase is to implement K-Means algorithm and also to construct the phylogenetic tree(dendrogram) of the amino acids dataset using Agglomerative(bottom-up) and Divisive(top-down) hierarchical clustering. For Agglomerative clustering, we use at least three linkages from single, complete, average, ward, centroid etc. What we achieve: ● Compare Agglomerative and Divisive method on the dataset and plot their phylogenetic trees (dendrograms). ● Cluster them using K-Means algorithm. (Chosen K = 5 and experimented with multiple runs with different mean points). ● After obtaining the clusters from K-Means, we compare them with clusters produced by the hierarchical clustering techniques(for all the linkages chosen) and highlight the differences for the particular K value.