Evaluating Sparse PCA With ℓ₀ Constraint

Sparse PCA with ℓ₀ constraint is a combinatorial optimization problem. Instead of explaining the variance of all variables, we will select a subset of k variables (multiplication by a diagonal matrix of zeroes and ones), and explain the maximal variance given this constraint.

PCA is often used for datasets where the projection of the data will not actually be replicable across experiments. Consider UMAP of single cell RNA-seq, on a 2000 x 2000 covariance matrix, where the observations can be (generously) modeled as independent samples of n Poisson variables (gene transcript quantities). The 10-dimensional subspace selected by PCA for UMAP will not very similar across replicate experiments, e.g. if the thousands of Poisson observations are resampled (bootstrapping). Instead of an eigenvector (PCA), we can select a linear combination of a small number of variables, which might be a more meaningful basis vector for explaining the underlying process. We can't select k genes in a gene set determination algorithm and expect replicable results, but the linear combination of genes might produce meaningful and replicable results for projecting the data.

ringw / l0pca

Evaluating Sparse PCA With ℓ₀ Constraint

About

Languages

Evaluating Sparse PCA With ℓ0 Constraint

About

Languages

Evaluating Sparse PCA With ℓ₀ Constraint