Evaluating log likelihood of BN with latent variable
zaRizk7 opened this issue · comments
Subject of the issue
I am trying to evaluate my model with a BN that has a latent variable. It seems that when evaluating it (although I have specify the latents when initializing the BN) it still asks for the data to have an observed latent.
Your environment
- pgmpy version:
0.1.22
- Python version:
3.9.16
- Operating System:
Ubuntu 20.04
Steps to reproduce
Here is my toy code for trying to evaluate a latent BN:
import pandas as pd
import numpy as np
from pgmpy.models import BayesianNetwork
from pgmpy.metrics import BayesianModelProbability
from pgmpy.estimators import ExpectationMaximization as EM
x = np.random.randint(0, 29, size=(30, 4))
df = pd.DataFrame(x, columns=[str(i) for i in range(4)])
model = BayesianNetwork(
[
("Z1", '0'),
("Z2", '1'),
("Z3", '2'),
("Z4", '3'),
("Z1", "Z2"),
("Z2", "Z3"),
("Z3", "Z4"),
],
latents=["Z1", "Z2", "Z3", "Z4"],
)
model.fit(df, estimator=EM)
model.check_model()
inference = BayesianModelProbability(model)
log_likelihood = inference.log_probability(df)
Expected behaviour
I was expecting it to work out of the box if we observed all of the variables in the data, it technically should marginalize the unobserved variables if all of the data was observed.
Actual behaviour
I got an error where even the latent variables needed to be observed. The error looks like
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_4121372/863970425.py in <cell line: 2>()
1 inference = BayesianModelProbability(model)
----> 2 inference.log_probability(df)
~/mambaforge/envs/deep-learning/lib/python3.9/site-packages/pgmpy/metrics/bn_inference.py in log_probability(self, data, ordering)
122
123 logp = np.array(
--> 124 [self._log_probability_node(data, ordering, node) for node in ordering]
125 )
126 return np.sum(logp, axis=0)
~/mambaforge/envs/deep-learning/lib/python3.9/site-packages/pgmpy/metrics/bn_inference.py in <listcomp>(.0)
122
123 logp = np.array(
--> 124 [self._log_probability_node(data, ordering, node) for node in ordering]
125 )
126 return np.sum(logp, axis=0)
~/mambaforge/envs/deep-learning/lib/python3.9/site-packages/pgmpy/metrics/bn_inference.py in _log_probability_node(self, data, ordering, node)
61 # conditional dependencies E of the probed variable
62 evidence = cpd.variables[:0:-1]
---> 63 evidence_idx = [ordering.index(ev) for ev in evidence]
64 evidence_val = data[:, evidence_idx]
65 evidence_no = np.empty_like(evidence_val, dtype=int)
~/mambaforge/envs/deep-learning/lib/python3.9/site-packages/pgmpy/metrics/bn_inference.py in <listcomp>(.0)
61 # conditional dependencies E of the probed variable
62 evidence = cpd.variables[:0:-1]
---> 63 evidence_idx = [ordering.index(ev) for ev in evidence]
64 evidence_val = data[:, evidence_idx]
65 evidence_no = np.empty_like(evidence_val, dtype=int)
ValueError: 'Z1' is not in list