HMM with observations as probabilities

Question

HMM with observations as probabilities

AndreasAZiegler opened this issue 4 years ago · comments

I would like to solve exercise c) and d) of the attached problem with HMMBase.jl. I'm quite new to HMM but as I understand, HMMBase needs some observation values to perform the forward, backward and viterbi-algorithm whereas the problem only provides probabilities. Therefore it seems I can't solve this problem with HMMBase. Please let me know if I misunderstand the problem and/or the usage of HMMBase.jl

hmm_example.pdf

Maxime Mouchet · Answer 1 · Tue Jun 02 2020 03:41:43 GMT+0800 (China Standard Time)

Hi,

In your case, the emission matrix O gives the probabilities P(Y_t = k | Z_t = i) = O_{ik}.
To model that using Distributions.jl you can use a Categorical distribution (one for each state).
It is parametrized by a probability vector where element i is the probability of symbol i.

See the examples below.
Hope this helps!

Maxime

julia> using Distributions

help?> Categorical
search: Categorical ncategories

  Categorical(p)

  A Categorical distribution is parameterized by a probability vector p (of length K).

P(X = k) = p[k]  \quad \text{for } k = 1, 2, \ldots, K.

using Distributions
using HMMBase

# Initial Probabilities
a = [0.3, 0.2, 0.5]

# Transition Matrix
A = [0 0.2 0.8; 1 0 0; 0.4 0.4 0.2]

# Observations Distributions (for each state)
B1 = Categorical([0, 0.5, 0.5])
B2 = Categorical([1, 0, 0])
B3 = Categorical([0.1, 0.2, 0.7])

# Example: probability of observing O3 in state 1
@show pdf(B1, 3)
# pdf(B1, 3) = 0.5

# Build the HMM
hmm = HMM(a, A, [B1, B2, B3])

# Observations
Yt = [1, 3, 1]

# Question c)
# `loglikelihood` uses the forward algorithm to compute the sequence probability.
@show exp(loglikelihood(hmm, Yt))
# exp(loglikelihood(hmm, Yt)) = 0.03374000000000003

# Question d)
@show viterbi(hmm, Yt)
# viterbi(hmm, Yt) = [2, 1, 2]

Andreas Ziegler · Answer 2 · Tue Jun 02 2020 13:57:07 GMT+0800 (China Standard Time)

Thanks a lot for your quick response and the example. It would have taken me hours to figure this out on my own.

The output of the viterbi algorithms comes to the same results as I came by hand calculation (the solution got the second state wrong).

However, by using forward, backward and (log)likelihood(s) I don't manage to come to the same numbers as I got with my hand calculation (here I think the solutions did not make any mistake). Can you guess where the difference comes from?

Maxime Mouchet · Answer 3 · Tue Jun 02 2020 16:31:24 GMT+0800 (China Standard Time)

Sorry I had not seen the correction in the pdf.

In my example I used the transition matrix provided in question a) without verifying.
But indeed it does not correspond to the transition diagram (P_{12} should be 0.8 not 0.2, there should be no self-transition in S3, etc.).

If we use the transition matrix that correspond to the diagram (the one provided in the correction), we get the right results :)

I will also document this use case (HMM from emission matrix) to make it easier for the users.

using Distributions
using HMMBase

# Initial Probabilities
a = [0.3, 0.2, 0.5]

# Transition Matrix
A = [0 0.8 0.2; 0.6 0 0.4; 0.0 1.0 0.0]
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

# Observations Distributions (for each state)
B1 = Categorical([0, 0.5, 0.5])
B2 = Categorical([1, 0, 0])
B3 = Categorical([0.1, 0.2, 0.7])

# Build the HMM
hmm = HMM(a, A, [B1, B2, B3])

# Observations
Yt = [1, 3, 1]

# Question c)
@show exp(loglikelihood(hmm, Yt))
# exp(loglikelihood(hmm, Yt)) = 0.10519999999999999 ≈ 0.1052

# Question d)
@show viterbi(hmm, Yt)
# viterbi(hmm, Yt) = [2, 3, 2]

Maxime Mouchet · Answer 4 · Tue Jun 02 2020 16:41:36 GMT+0800 (China Standard Time)

Also, if you get different results in the forward/backward algorithm, this is because in HMMBase (and most forward/backward implementations), we scale the values so that they sum to 1 (for numerical stability).

forward(hmm, Yt)[1]
# 3×3 Array{Float64,2}:
#  0.0       0.8       0.2
#  0.517241  0.0       0.482759
#  0.0       0.988593  0.0114068

# Correction is unscaled. HMMBase is scaled.
# t = 1
# Unscaled: 0 0.2      0.05      (sum = 0.25)
# Scaled:   0 0.2/0.25 0.05/0.25
#      =    0 0.8      0.2

# t = 2
# Unscaled: 0.06       0 0.056      (sum = 0.116)
# Scaled:   0.06/0.116 0 0.056/0.116
#      =    0.517241   0 0.482759
# ...

Andreas Ziegler · Answer 5 · Tue Jun 02 2020 23:23:14 GMT+0800 (China Standard Time)

Thanks again for the thorough explanation. I also read that scaling is important to avoid numerical issues especially in larger HMM. As you state that HMMBase also scales, I assume there is no way to recover the unscaled values, or is there?

Maxime Mouchet · Answer 6 · Wed Jun 03 2020 02:16:18 GMT+0800 (China Standard Time)

There is currently no way.
I'll try to add this possibility in a future version, at-least for educational purpose.

Maxime Mouchet · Answer 7 · Wed Jun 03 2020 02:49:38 GMT+0800 (China Standard Time)

I've added a new constructor so that you can build an HMM directly from an emission matrix (instead of building categorical distributions by hand).

Using your example:

a = [0.3, 0.2, 0.5]
A = [0 0.8 0.2; 0.6 0 0.4; 0.0 1.0 0.0]
B = [0 0.5 0.5; 1.0 0 0; 0.1 0.2 0.7]
hmm = HMM(a, A, B)

Andreas Ziegler · Answer 8 · Wed Jun 03 2020 02:51:43 GMT+0800 (China Standard Time)

Wow, pretty cool. That definitively saves some typing.

Andreas Ziegler · Answer 9 · Sat Jun 27 2020 22:27:42 GMT+0800 (China Standard Time)

@maxmouchet I calculated the viterbi calculations by hand and don't get the same at t=2 as your implementation. The implementation takes state 3 with p=0.056 over state 1 with p=0.06. Is there some numerical issue or do i misunderstand the viterbi algorithm?

Maxime Mouchet · Answer 10 · Tue Jun 30 2020 17:53:59 GMT+0800 (China Standard Time)

In the Viterbi algorithm the state sequence is inferred backwards, starting from the last state.
Your computation for V_1 is correct but for V_2 you also need to record the ancestors.
That is, argmax_i P_{ij} V_1(i), or more generally, A[t,j] = argmax_i P_{ij} V_{t-1}(i).

For V_2(1), P_{i1} V_1(i) is maximized when i = 2, so the ancestor is 2.
For V_2(2), P_{i2} V_1(i) is maximized when i = 3, so the ancestor is 3.
For V_2(3), P_{i3} V_1(i) is maximized when i = 2, so the ancestor is 2.
And so on for V_3(.).

At the end you should get the following ancestors:

A[2,1] = 2
A[2,2] = 3
A[2,3] = 2
A[3,1] = _
A[3,2] = 3
A[3,3] = 2

Finally, to find the state sequence, we start from z_T = argmax_j V_T(j), and z_t = A[t+1, z_{t+1}].
Here z_3 = 2, z_2 = A[3,2] = 3 and z_1 = A[2,3] = 2.

Andreas Ziegler · Answer 11 · Tue Jul 07 2020 14:29:07 GMT+0800 (China Standard Time)

Thanks a lot for the explanation. Your way of explaining it is way better than in the textbook, I have.

Maxime Mouchet · Answer 12 · Tue Jul 07 2020 15:58:18 GMT+0800 (China Standard Time)

A good introduction to HMMs is https://www.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf.

Maxime Mouchet · Answer 13 · Sun Oct 11 2020 01:19:42 GMT+0800 (China Standard Time)

I close this issue for now. Feel free to reopen if you have another question.