ExpFamilyPCA.jl is a Julia package for performing exponential principal component analysis (EPCA). ExpFamilyPCA.jl supports custom objectives and includes fast implementations for several common distributions.
For detailed documentation on each function and additional examples, please refer to the documentation.
To install the package, use the Julia package manager. In the Julia REPL, type:
using Pkg; Pkg.add("ExpFamilyPCA")
using ExpFamilyPCA
indim = 5
X = rand(1:100, (10, indim)) # data matrix to compress
outdim = 3 # target compression dimension
poisson_epca = PoissonEPCA(indim, outdim)
X_compressed = fit!(poisson_epca, X; maxiter=200, verbose=true)
Y = rand(1:100, (3, indim)) # test data
Y_compressed = compress(poisson_epca, Y; maxiter=200, verbose=true)
X_reconstructed = decompress(poisson_epca, X_compressed)
Y_reconstructed = decompress(poisson_epca, Y_compressed)
Distribution | ExpFamilyPCA.jl |
Objective | Link Function |
---|---|---|---|
Bernoulli | BernoulliEPCA |
||
Binomial | BinomialEPCA |
||
Continuous Bernoulli | ContinuousBernoulliEPCA |
||
Gamma1 |
GammaEPCA or ItakuraSaitoEPCA
|
||
Gaussian2 |
GaussianEPCA or NormalEPCA
|
||
Negative Binomial | NegativeBinomialEPCA |
||
Pareto | ParetoEPCA |
||
Poisson3 | PoissonEPCA |
||
Weibull | WeibullEPCA |
1: The gamma EPCA objective is equivalent to minimizing the Itakura-Saito distance.
2: The Gaussian EPCA objective is equivalent to usual PCA
3: The Poisson EPCA objective is equivalent to minimizing the generalized KL divergence.
When working with custom distributions, it is often the case that certain specifications are more convenient than others. For example, writing the log-partition of the gamma distribution
effeciently in Julia even though the two are equivalent.
ExpFamilyPCA.jl includes 10 constructors for custom distributions. All constrcutors are theoretically equivalent though some may be faster in practice. To showcase each constructor, we walk through how to construct a Poisson EPCA instance with each constructor. First, we provide a quick recap on notation.
-
$G$ is the log-partition function.$G$ is strictly convex and continuously differentiable. -
$g$ is the link function. It is the derivative of the log-partition$\nabla_\theta G(\theta) = g(\theta)$ and the inverse of the derivative of the convex conjugate of the log-parition$g = f^{-1}$ . -
$F$ is the convex conjugate (under the Legendre transform) of the log-partition$F = G^*$ . -
$f$ is the derivative of the convex conjugate$\nabla_x F(x) = f(x)$ and the inverse of the link function$f = g^{-1}$ . -
$B_F(p | q)$ is the Bregman divergence induced from$F$ .
For the Poisson distribution, these terms take the following values.
Term | Math | Julia |
---|---|---|
G = exp |
||
g = exp |
||
F(x) = x * log(x) - x |
||
f(x) = log(x) |
||
B(p, q) = p * log(p / q) + q - p |
||
Bg(x, θ) = e^θ - x * θ + x * log(x) - x |
The Bregman distance can also be specified using Distances.jl
using Distances
B = Distances.gkl_divergence
EPCA(indim, outdim, F, g, Val((:F, :g)))
EPCA(indim, outdim, F, f, Val((:F, :f)))
EPCA(indim, outdim, F, Val((:F)))
EPCA(indim, outdim, F, G, Val((:F, :G)))
EPCA(indim, outdim, G, g, Val((:G, :g)))
EPCA(indim, outdim, G, Val((:G)))
EPCA(indim, outdim, B, g, Val((:B, :g)))
EPCA(indim, outdim, B, G, Val((:B, :G)))
EPCA(indim, outdim, Bg, g, Val((:Bg, :g)))
EPCA(indim, outdim, Bg, G, Val((:Bg, :G)))
Contributions are welcome! If you want to contribute, please fork the repository, create a new branch, and submit a pull request. Before contributing, please make sure to update tests as appropriate.