Evaluating log likelihood of BN with latent variable

Question

Evaluating log likelihood of BN with latent variable

zaRizk7 opened this issue a year ago · comments

Subject of the issue

I am trying to evaluate my model with a BN that has a latent variable. It seems that when evaluating it (although I have specify the latents when initializing the BN) it still asks for the data to have an observed latent.

Your environment

pgmpy version: 0.1.22
Python version: 3.9.16
Operating System: Ubuntu 20.04

Steps to reproduce

Here is my toy code for trying to evaluate a latent BN:

import pandas as pd
import numpy as np

from pgmpy.models import BayesianNetwork
from pgmpy.metrics import BayesianModelProbability
from pgmpy.estimators import ExpectationMaximization as EM

x = np.random.randint(0, 29, size=(30, 4))
df = pd.DataFrame(x, columns=[str(i) for i in range(4)])

model = BayesianNetwork(
    [
        ("Z1", '0'),
        ("Z2", '1'),
        ("Z3", '2'),
        ("Z4", '3'),
        ("Z1", "Z2"),
        ("Z2", "Z3"),
        ("Z3", "Z4"),
    ],
    latents=["Z1", "Z2", "Z3", "Z4"],
)

model.fit(df, estimator=EM)
model.check_model()

inference = BayesianModelProbability(model)
log_likelihood = inference.log_probability(df)

Expected behaviour

I was expecting it to work out of the box if we observed all of the variables in the data, it technically should marginalize the unobserved variables if all of the data was observed.

Actual behaviour

I got an error where even the latent variables needed to be observed. The error looks like

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_4121372/863970425.py in <cell line: 2>()
      1 inference = BayesianModelProbability(model)
----> 2 inference.log_probability(df)

~/mambaforge/envs/deep-learning/lib/python3.9/site-packages/pgmpy/metrics/bn_inference.py in log_probability(self, data, ordering)
    122 
    123         logp = np.array(
--> 124             [self._log_probability_node(data, ordering, node) for node in ordering]
    125         )
    126         return np.sum(logp, axis=0)

~/mambaforge/envs/deep-learning/lib/python3.9/site-packages/pgmpy/metrics/bn_inference.py in <listcomp>(.0)
    122 
    123         logp = np.array(
--> 124             [self._log_probability_node(data, ordering, node) for node in ordering]
    125         )
    126         return np.sum(logp, axis=0)

~/mambaforge/envs/deep-learning/lib/python3.9/site-packages/pgmpy/metrics/bn_inference.py in _log_probability_node(self, data, ordering, node)
     61         # conditional dependencies E of the probed variable
     62         evidence = cpd.variables[:0:-1]
---> 63         evidence_idx = [ordering.index(ev) for ev in evidence]
     64         evidence_val = data[:, evidence_idx]
     65         evidence_no = np.empty_like(evidence_val, dtype=int)

~/mambaforge/envs/deep-learning/lib/python3.9/site-packages/pgmpy/metrics/bn_inference.py in <listcomp>(.0)
     61         # conditional dependencies E of the probed variable
     62         evidence = cpd.variables[:0:-1]
---> 63         evidence_idx = [ordering.index(ev) for ev in evidence]
     64         evidence_val = data[:, evidence_idx]
     65         evidence_no = np.empty_like(evidence_val, dtype=int)

ValueError: 'Z1' is not in list