compression compression-algorithm denoising dimensionality-reduction epca exponential-family interpretability julia machine-learning pca principal-component-analysis reinforcement-learning signal-processing

ExpFamilyPCA.jl

ExpFamilyPCA.jl is a Julia package for performing exponential principal component analysis (EPCA). ExpFamilyPCA.jl supports custom objectives and includes fast implementations for several common distributions.

Documentation

For detailed documentation on each function and additional examples, please refer to the documentation.

Installation

To install the package, use the Julia package manager. In the Julia REPL, type:

using Pkg; Pkg.add("ExpFamilyPCA")

Quickstart

using ExpFamilyPCA

indim = 5
X = rand(1:100, (10, indim))  # data matrix to compress
outdim = 3  # target compression dimension

poisson_epca = PoissonEPCA(indim, outdim)

X_compressed = fit!(poisson_epca, X; maxiter=200, verbose=true)

Y = rand(1:100, (3, indim))  # test data
Y_compressed = compress(poisson_epca, Y; maxiter=200, verbose=true)

X_reconstructed = decompress(poisson_epca, X_compressed)
Y_reconstructed = decompress(poisson_epca, Y_compressed)

Supported Models

Distribution	`ExpFamilyPCA.jl`	Objective	Link Function $g(\theta)$
Bernoulli	`BernoulliEPCA`	$\log(1 + e^{\theta-2x\theta})$	$\frac{e^\theta}{1+e^\theta}$
Binomial	`BinomialEPCA`	$n \log(1 + e^\theta) - x\theta$	$\frac{ne^\theta}{1+e^\theta}$
Continuous Bernoulli	`ContinuousBernoulliEPCA`	$\log\Bigg(\frac{e^\theta -1}{\theta}\Bigg) - x\theta$	$\frac{\theta - 1}{\theta} + \frac{1}{e^\theta - 1}$
Gamma¹	`GammaEPCA` or `ItakuraSaitoEPCA`	$-\log(-\theta) - x\theta$	$-1/\theta$
Gaussian²	`GaussianEPCA` or `NormalEPCA`	$\frac{1}{2}(x - \theta)^2$	$\theta$
Negative Binomial	`NegativeBinomialEPCA`	$-r \log(1 - e^\theta) - x\theta$	$\frac{-re^\theta}{e^\theta - 1}$
Pareto	`ParetoEPCA`	$-\log(-1-\theta) + \theta \log m - x \theta$	$\log m - \frac{1}{\theta+1}$
Poisson³	`PoissonEPCA`	$e^\theta - x \theta$	$e^\theta$
Weibull	`WeibullEPCA`	$-\log(-\theta) - x \theta$	$-1/\theta$

¹: The gamma EPCA objective is equivalent to minimizing the Itakura-Saito distance.

²: The Gaussian EPCA objective is equivalent to usual PCA

³: The Poisson EPCA objective is equivalent to minimizing the generalized KL divergence.

Custom Distributions

When working with custom distributions, it is often the case that certain specifications are more convenient than others. For example, writing the log-partition of the gamma distribution $G(\theta) = -\log(-\theta)$ and its derivative $g(\theta) = -1 / \theta$ is much simpler than coding the Itakura-Saito distance

$$ \frac{1}{2\pi} \int_{-\pi}^{\pi} \Bigg[ \frac{P(\omega)}{\hat{P}(\omega)} - \log \frac{P(\omega)}{\hat{P}{\omega}} - 1\Bigg] d\omega $$

effeciently in Julia even though the two are equivalent.

ExpFamilyPCA.jl includes 10 constructors for custom distributions. All constrcutors are theoretically equivalent though some may be faster in practice. To showcase each constructor, we walk through how to construct a Poisson EPCA instance with each constructor. First, we provide a quick recap on notation.

$G$ is the log-partition function. $G$ is strictly convex and continuously differentiable.
$g$ is the link function. It is the derivative of the log-partition $\nabla_\theta G(\theta) = g(\theta)$ and the inverse of the derivative of the convex conjugate of the log-parition $g = f^{-1}$.
$F$ is the convex conjugate (under the Legendre transform) of the log-partition $F = G^*$.
$f$ is the derivative of the convex conjugate $\nabla_x F(x) = f(x)$ and the inverse of the link function $f = g^{-1}$.
$B_F(p | q)$ is the Bregman divergence induced from $F$.

For the Poisson distribution, these terms take the following values.

Term	Math	Julia
$G(\theta)$	$e^x$	`G = exp`
$g(\theta)$	$e^x$	`g = exp`
$F(x)$	$x \log x - x$	`F(x) = x * log(x) - x`
$f(x)$	$\log x$	`f(x) = log(x)`
$B_F(p \| q)$	$p \log(p/q) + q - p$	`B(p, q) = p * log(p / q) + q - p`
$B_F(x \| g(\theta))$	$e^\theta - x\theta + x \log x - x$	`Bg(x, θ) = e^θ - x * θ + x * log(x) - x`

The Bregman distance can also be specified using Distances.jl

using Distances

B = Distances.gkl_divergence

Constructors

EPCA(indim, outdim, F, g, Val((:F, :g)))
EPCA(indim, outdim, F, f, Val((:F, :f)))
EPCA(indim, outdim, F, Val((:F)))
EPCA(indim, outdim, F, G, Val((:F, :G)))
EPCA(indim, outdim, G, g, Val((:G, :g)))
EPCA(indim, outdim, G, Val((:G)))
EPCA(indim, outdim, B, g, Val((:B, :g)))
EPCA(indim, outdim, B, G, Val((:B, :G)))
EPCA(indim, outdim, Bg, g, Val((:Bg, :g)))
EPCA(indim, outdim, Bg, G, Val((:Bg, :G)))

Tips and Tricks

Metaprogramming

Dropping Constants

Selecting Constructors

Sobol Initialization

Contributing

Contributions are welcome! If you want to contribute, please fork the repository, create a new branch, and submit a pull request. Before contributing, please make sure to update tests as appropriate.

About

A Julia package for exponential family principal component analysis (EPCA).

https://sisl.github.io/ExpFamilyPCA.jl/

compression compression-algorithm denoising dimensionality-reduction epca exponential-family interpretability julia machine-learning pca principal-component-analysis reinforcement-learning signal-processing

MIT License

Languages

Language:Jupyter Notebook 98.1%Language:Julia 1.4%Language:TeX 0.5%