gymrek-lab / CAHMML

Custom Lambda HMM Library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ryan Eveloff, Denghui Chen

CλHMML: Custom Lambda HMM Library

CλHMML (aka cahmml, pronounced camel) is a lightweight library meant to simplify complex Hidden Markov Models. We provide two abstract classes, Observation and State, which when implemented can run seamlessly in a parallelized HMM structure built on NumPy matrices.

Motivation

During our research into multimodal genetic HMMs, we found that the majority of plug and play HMMs available require the input of a single transition matrix $T$ and a single, finite-library emission matrix $E$. In our case, we required a scalable, multi-sample HMM architecture that could operate with a Bayesian model at each timestep. After asking our colleagues, we found that many labs simply recreate the boilerplate code necessary for running an HMM each time they require it for their research. In the effort of saving time and making HMMs a simple and efficient interface for unsupervised language modeling, we created CλHMML.

Installation

Install cahmml from PyPi using the following command:

pip3 install cahmml

Usage

Importing CλHMML

from cahmml import hmm

If necessary, you can also import the utilities for CλHMML via cahmml.util, though it is unnecessary and generally not useful.

Initializing an HMM

State Abstract Class

An implementation of hmm.State requires 2 functions to be completed:

  • transition_probability
  • emission_probability
# State class
class MyState(hmm.State):
  
  def emission_probability(self,obs:Iterable[Observation],t:int,hyperparameters:dict = {}) -> np.ndarray:
    return P(obs|self,t,hyperparameters)
  
  def transition_probability(self,next:"State",obs:Iterable[Observation],t:int,hyperparameters:dict = {}) -> np.ndarray:
    return P(next|self,obs,t,hyperparameters)

Observation Abstract Class

An implementation of hmm.Observation requires nothing to be completed and serves as a modable passthrough class for hmm.State. You can even use built-in classes like int or str! In the case below, we use a simple str wrapper.

# Observation Class
class myObservation(hmm.Observation):
  
  def __init__(self,value:str):
    self.v = value

Filling Samples with Observations

Pass in a sample_id and an iterable of hmm.Observation to create a sample.

   # Given list[Observation] obs
   myFirstSample = hmm.Sample("first sample!",obs)

Running an HMM

Assuming you've already implemented hmm.State and hmm.Observation, running Viterbi on your HMM with a given input is convenient and fast!

# Given list[hmm.State] states, list[hmm.Sample] samples, and list[float] initial_probs
model = hmm.HMM(states)
model.fit(samples,initial_probs)
pred_states = model.viterbi()

Note: Advanced users can specify hyperparameters for each function via e_hparams and t_hparams!

This code will yield an array corresponding to the Viterbi-predicted state of each sample at each observation.

Addendum: Complexity Analysis

Filling $T$ and $E$ runs in $\mathcal{O}(m \cdot n \cdot s \cdot f)$ time, where $m$ is the number of samples, $n$ is the number of observations, $s$ is the number of states, and $f$ is the maximum runtime of transition_probability and emission_probability. NumPy parallelization allows Viterbi runtime to scale linearly with the number of observations, or $\mathcal{O}(n)$.

Space complexity has been reduced to $\mathcal{O}(ms^2)$ for most of runtime. Viterbi space complexity has not been decreased for easier backtracking, though compression could be implemented to drop space complexity to $\Theta(m \cdot n)$, the floor required to return final state predictions.

More anecdotally, we expect a run of 100 states, 100 samples, 1,000,000 observations, and constant time $T$ and $E$ functions to run in less than an hour with consumer-grade hardware.

Testing

Coverage reports are available in our test branch; for simple HMM testing, we validated output using hmmlearn by scikitlearn. For complex HMM testing, we used small, hand-reproducible examples.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

About

Custom Lambda HMM Library

License:MIT License


Languages

Language:Python 100.0%