Sampling error with GibsSampling for some Bayesian Networks

Question

Sampling error with GibsSampling for some Bayesian Networks

huetstep opened this issue a year ago · comments

Subject of the issue

GibsSampling for Bayesian Networks can generate errors when calling sample(size=3). The error is dependent on the CPDs order given for the definition of the Bayesian Network and is generated by products of CPDs.
For instance, P(x1).P(x2|x1,x3).P(x3) generates an error on the shapes of CPDs.

Your environment

pgmpy version v0.1.21
Python version 3.9
Operating System Ubuntu 22.04

Steps to reproduce

For the following Bayesian Network:
model2 = BayesianNetwork(
[
('x1', 'x3'),
('x1', 'x2'),
('x3', 'x2')
]
)
cpd_x1 = TabularCPD(variable='x1', variable_card=3,
values=[[0.6], [0.3], [0.1]])
cpd_x2 = TabularCPD(variable='x2', variable_card=2,
values=[[0.7, 0.8, 0.9, 0.6, 0.7, 0.8, 0.5, 0.6, 0.7],
[0.3, 0.2, 0.1, 0.4, 0.3, 0.2, 0.5, 0.4, 0.3]],
evidence=['x1','x3'],
evidence_card=[3,3])

cpd_x3 = TabularCPD(variable='x3', variable_card=3,
values=[[0.1, 0.333, 0.6],
[0.3, 0.333, 0.3],
[0.6, 0.334, 0.1]],
evidence=['x1'],
evidence_card=[3])

model2.add_cpds(cpd_x1, cpd_x2, cpd_x3)
gibbs_chain = GibbsSampling(model2)
gibbs_chain.sample(size=3)

Expected behaviour

gibbs_chain.sample(size=3) should provide a DataFrame of 3 samples.

Actual behaviour

Tell us what happens instead

File ~/soft/miniconda3/lib/python3.9/site-packages/pgmpy/sampling/Sampling.py:403, in GibbsSampling.init(self, model)
401 super(GibbsSampling, self).init()
402 if isinstance(model, BayesianNetwork):
--> 403 self._get_kernel_from_bayesian_model(model)
404 elif isinstance(model, MarkovNetwork):
405 self._get_kernel_from_markov_model(model)

File ~/soft/miniconda3/lib/python3.9/site-packages/pgmpy/sampling/Sampling.py:429, in GibbsSampling._get_kernel_from_bayesian_model(self, model)
427 cpds = [cpd for cpd in model.cpds if var in cpd.scope()]
--> 428 prod_cpd = factor_product(*cpds)
429 kernel = {}
430 scope = set(prod_cpd.scope())

File ~/soft/miniconda3/lib/python3.9/site-packages/pgmpy/factors/base.py:76, in factor_product(*args)
74 return args[0].copy()
75 else:
---> 76 return reduce(lambda phi1, phi2: phi1 * phi2, args)

File ~/soft/miniconda3/lib/python3.9/site-packages/pgmpy/factors/base.py:76, in factor_product..(phi1, phi2)
74 return args[0].copy()
75 else:
---> 76 return reduce(lambda phi1, phi2: phi1 * phi2, args)

File ~/soft/miniconda3/lib/python3.9/site-packages/pgmpy/factors/discrete/DiscreteFactor.py:930, in DiscreteFactor.mul(self, other)
929 def mul(self, other):
--> 930 return self.product(other, inplace=False)

File ~/soft/miniconda3/lib/python3.9/site-packages/pgmpy/factors/discrete/DiscreteFactor.py:697, in DiscreteFactor.product(self, phi1, inplace)
654 def product(self, phi1, inplace=True):
655 """
656 DiscreteFactor product with phi1.
657
(...)
695 [55, 77]]]]
696 """
--> 697 phi = self if inplace else self.copy()
698 if isinstance(phi1, (int, float)):
699 phi.values *= phi1

File ~/soft/miniconda3/lib/python3.9/site-packages/pgmpy/factors/discrete/CPD.py:299, in TabularCPD.copy(self)
297 evidence = self.variables[1:] if len(self.variables) > 1 else None
298 evidence_card = self.cardinality[1:] if len(self.variables) > 1 else None
--> 299 return TabularCPD(
300 self.variable,
301 self.variable_card,
302 self.get_values(),
303 evidence,
304 evidence_card,
305 state_names=self.state_names.copy(),
306 )

File ~/soft/miniconda3/lib/python3.9/site-packages/pgmpy/factors/discrete/CPD.py:133, in TabularCPD.init(self, variable, variable_card, values, evidence, evidence_card, state_names)
131 expected_cpd_shape = (variable_card, np.product(evidence_card))
132 if values.shape != expected_cpd_shape:
--> 133 raise ValueError(
134 f"values must be of shape {expected_cpd_shape}. Got shape: {values.shape}"
135 )
137 if not isinstance(state_names, dict):
138 raise ValueError(
139 f"state_names must be of type dict. Got {type(state_names)}"
140 )

ValueError: values must be of shape (2, 24). Got shape: (3, 24)

huetstep · Answer 1 · Tue Mar 14 2023 19:07:41 GMT+0800 (China Standard Time)

The error can be removed in Sampling.py, class GibbsSampling, def _get_kernel_from_bayesian_model(self, model):

    for var in self.variables:
        other_vars = [v for v in self.variables if var != v]
        other_cards = [self.cardinalities[v] for v in other_vars]
        # REPLACE BY THE NEXT LINE cpds = [cpd for cpd in model.cpds if var in cpd.scope()]
        cpds = [cpd.to_factor() for cpd in model.cpds if var in cpd.scope()]
        prod_cpd = factor_product(*cpds)
        kernel = {}
        scope = set(prod_cpd.scope())
        for tup in itertools.product(*[range(card) for card in other_cards]):
            states = [State(v, s) for v, s in zip(other_vars, tup) if v in scope]
            # REPLACE BY THE NEXT LINE prod_cpd_reduced = prod_cpd.to_factor().reduce(states, inplace=False)
            prod_cpd_reduced = prod_cpd.reduce(states, inplace=False)
            kernel[tup] = prod_cpd_reduced.values / sum(prod_cpd_reduced.values)
        self.transition_models[var] = kernel

huetstep · Answer 2 · Tue Mar 14 2023 19:22:59 GMT+0800 (China Standard Time)

It seems to be corrected in the dev branch:
ed585f7