This repository contains an implementation of three main algorithms to compute archetypes:
- Original method (AA_Original), as proposed in the original paper.
- Principal Convex Hull method (AA_PCHA), as proposed in Archetypal analysis for machine learning and data mining.
- Adaptation of the Frank-Wolfe algorithm (AA_Fast), as proposed in Archetypal Analysis as an Autoencoder.
I developed this code as part of my Mathematics Undergraduate Thesis on Archetypal Analysis at UAM. Find the original Thesis in Spanish here, and an English translation here.
In the Python implementation
directory, one can find the implementation of the three algorithms in Python. Moreover,
time_comparison.py
is a script that compares the performance of the three of them.
Out of the three proposals, the first two were already implemented in R. One can install them by running the following commands:
# Original implementation
install.packages("archetypes")
library("archetypes")
# PCHA implementation
install_version("archetypal", version = "1.1.1", repos = "http://cran.us.r-project.org", dependencies=T)
library("archetypal")
The adaptation of the Frank-Wolfe algorithm is implemented in the R implementation
directory.
One of the main features of archetypal analysis is that they are interpretable. Taking advantage of this, we have
implemented a function to visualize the distribution of weights of a sample for a set of archetypes. This functionality
is available in archetypal_plot.py
and produces Figures like the next one:
Although further details are provided in my Undergraduate Thesis, the following Figure summarizes the performance comparison of the three algorithms (in Python).
In order to demonstrate the advantages of archetypal analysis over other unsupervised methods (PCA, k-means), they have been compared in two examples. Code is available in Kaggle visiting the following links: