Sampling error with GibsSampling for some Bayesian Networks
huetstep opened this issue · comments
Subject of the issue
GibsSampling for Bayesian Networks can generate errors when calling sample(size=3). The error is dependent on the CPDs order given for the definition of the Bayesian Network and is generated by products of CPDs.
For instance, P(x1).P(x2|x1,x3).P(x3) generates an error on the shapes of CPDs.
Your environment
- pgmpy version v0.1.21
- Python version 3.9
- Operating System Ubuntu 22.04
Steps to reproduce
For the following Bayesian Network:
model2 = BayesianNetwork(
[
('x1', 'x3'),
('x1', 'x2'),
('x3', 'x2')
]
)
cpd_x1 = TabularCPD(variable='x1', variable_card=3,
values=[[0.6], [0.3], [0.1]])
cpd_x2 = TabularCPD(variable='x2', variable_card=2,
values=[[0.7, 0.8, 0.9, 0.6, 0.7, 0.8, 0.5, 0.6, 0.7],
[0.3, 0.2, 0.1, 0.4, 0.3, 0.2, 0.5, 0.4, 0.3]],
evidence=['x1','x3'],
evidence_card=[3,3])
cpd_x3 = TabularCPD(variable='x3', variable_card=3,
values=[[0.1, 0.333, 0.6],
[0.3, 0.333, 0.3],
[0.6, 0.334, 0.1]],
evidence=['x1'],
evidence_card=[3])
model2.add_cpds(cpd_x1, cpd_x2, cpd_x3)
gibbs_chain = GibbsSampling(model2)
gibbs_chain.sample(size=3)
Expected behaviour
gibbs_chain.sample(size=3) should provide a DataFrame of 3 samples.
Actual behaviour
Tell us what happens instead
File ~/soft/miniconda3/lib/python3.9/site-packages/pgmpy/sampling/Sampling.py:403, in GibbsSampling.init(self, model)
401 super(GibbsSampling, self).init()
402 if isinstance(model, BayesianNetwork):
--> 403 self._get_kernel_from_bayesian_model(model)
404 elif isinstance(model, MarkovNetwork):
405 self._get_kernel_from_markov_model(model)
File ~/soft/miniconda3/lib/python3.9/site-packages/pgmpy/sampling/Sampling.py:429, in GibbsSampling._get_kernel_from_bayesian_model(self, model)
427 cpds = [cpd for cpd in model.cpds if var in cpd.scope()]
--> 428 prod_cpd = factor_product(*cpds)
429 kernel = {}
430 scope = set(prod_cpd.scope())
File ~/soft/miniconda3/lib/python3.9/site-packages/pgmpy/factors/base.py:76, in factor_product(*args)
74 return args[0].copy()
75 else:
---> 76 return reduce(lambda phi1, phi2: phi1 * phi2, args)
File ~/soft/miniconda3/lib/python3.9/site-packages/pgmpy/factors/base.py:76, in factor_product..(phi1, phi2)
74 return args[0].copy()
75 else:
---> 76 return reduce(lambda phi1, phi2: phi1 * phi2, args)
File ~/soft/miniconda3/lib/python3.9/site-packages/pgmpy/factors/discrete/DiscreteFactor.py:930, in DiscreteFactor.mul(self, other)
929 def mul(self, other):
--> 930 return self.product(other, inplace=False)
File ~/soft/miniconda3/lib/python3.9/site-packages/pgmpy/factors/discrete/DiscreteFactor.py:697, in DiscreteFactor.product(self, phi1, inplace)
654 def product(self, phi1, inplace=True):
655 """
656 DiscreteFactor product with phi1
.
657
(...)
695 [55, 77]]]]
696 """
--> 697 phi = self if inplace else self.copy()
698 if isinstance(phi1, (int, float)):
699 phi.values *= phi1
File ~/soft/miniconda3/lib/python3.9/site-packages/pgmpy/factors/discrete/CPD.py:299, in TabularCPD.copy(self)
297 evidence = self.variables[1:] if len(self.variables) > 1 else None
298 evidence_card = self.cardinality[1:] if len(self.variables) > 1 else None
--> 299 return TabularCPD(
300 self.variable,
301 self.variable_card,
302 self.get_values(),
303 evidence,
304 evidence_card,
305 state_names=self.state_names.copy(),
306 )
File ~/soft/miniconda3/lib/python3.9/site-packages/pgmpy/factors/discrete/CPD.py:133, in TabularCPD.init(self, variable, variable_card, values, evidence, evidence_card, state_names)
131 expected_cpd_shape = (variable_card, np.product(evidence_card))
132 if values.shape != expected_cpd_shape:
--> 133 raise ValueError(
134 f"values must be of shape {expected_cpd_shape}. Got shape: {values.shape}"
135 )
137 if not isinstance(state_names, dict):
138 raise ValueError(
139 f"state_names must be of type dict. Got {type(state_names)}"
140 )
ValueError: values must be of shape (2, 24). Got shape: (3, 24)
The error can be removed in Sampling.py, class GibbsSampling, def _get_kernel_from_bayesian_model(self, model):
for var in self.variables:
other_vars = [v for v in self.variables if var != v]
other_cards = [self.cardinalities[v] for v in other_vars]
# REPLACE BY THE NEXT LINE cpds = [cpd for cpd in model.cpds if var in cpd.scope()]
cpds = [cpd.to_factor() for cpd in model.cpds if var in cpd.scope()]
prod_cpd = factor_product(*cpds)
kernel = {}
scope = set(prod_cpd.scope())
for tup in itertools.product(*[range(card) for card in other_cards]):
states = [State(v, s) for v, s in zip(other_vars, tup) if v in scope]
# REPLACE BY THE NEXT LINE prod_cpd_reduced = prod_cpd.to_factor().reduce(states, inplace=False)
prod_cpd_reduced = prod_cpd.reduce(states, inplace=False)
kernel[tup] = prod_cpd_reduced.values / sum(prod_cpd_reduced.values)
self.transition_models[var] = kernel