pgmpy / pgmpy

Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks.

Home Page:https://pgmpy.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error loading Bayesian model

Zhu-811 opened this issue · comments

Subject of the issue

1. Using BIFReader to load a Bayesian network containing Chinese name nodes gives an error as follows:
mm = BIFReader('ci_demo1.bif')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/pgmpy/readwrite/BIF.py", line 96, in __init__
    self.variable_names = self.get_variables()
  File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/pgmpy/readwrite/BIF.py", line 206, in get_variables
    name = self.name_expr.searchString(block)[0][0]
  File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/pyparsing/results.py", line 193, in __getitem__
    return self._toklist[i]
IndexError: list index out of range

2. When the value of the node is a string, use infer.estimate_ate to report an error as follows:

df
       A    B    C      D      E
0    one  two  one  three    one
1  three  one  two    two  three
2    two  one  two    two    one`
infer.estimate_ate(edge[0], edge[1], data=df)
`Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/pgmpy/inference/CausalInference.py", line 391, in estimate_ate
    ate = [
  File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/pgmpy/inference/CausalInference.py", line 392, in <listcomp>
    self.estimator.fit(X=x1, Y=x2, Z=s, data=data, **kwargs)._get_ate()
  File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/pgmpy/estimators/LinearModel.py", line 25, in fit
    self.estimator = self._model(X, Y, Z, data, **kwargs).fit()
  File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/pgmpy/estimators/LinearModel.py", line 22, in _model
    return self.estimator(endog=endog, exog=exog, **kwargs)
  File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/statsmodels/regression/linear_model.py", line 922, in __init__
    super(OLS, self).__init__(endog, exog, missing=missing,
  File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/statsmodels/regression/linear_model.py", line 748, in __init__
    super(WLS, self).__init__(endog, exog, missing=missing,
  File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/statsmodels/regression/linear_model.py", line 202, in __init__
    super(RegressionModel, self).__init__(endog, exog, **kwargs)
  File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/statsmodels/base/model.py", line 270, in __init__
    super().__init__(endog, exog, **kwargs)
  File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/statsmodels/base/model.py", line 95, in __init__
    self.data = self._handle_data(endog, exog, missing, hasconst,
  File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/statsmodels/base/model.py", line 135, in _handle_data
    data = handle_data(endog, exog, missing, hasconst, **kwargs)
  File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/statsmodels/base/data.py", line 675, in handle_data
    return klass(endog, exog=exog, missing=missing, hasconst=hasconst,
  File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/statsmodels/base/data.py", line 84, in __init__
    self.endog, self.exog = self._convert_endog_exog(endog, exog)
  File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/statsmodels/base/data.py", line 509, in _convert_endog_exog
    raise ValueError("Pandas data cast to numpy dtype of object. "
ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).

Your environment

  • pgmpy 0.1.23
  • Python 3.9.17
  • Operating Ununtu 18.04

Steps to reproduce

data1 = [['one', 'two', 'one', 'three', 'one'],
        ['three', 'one', 'two', 'two', 'three'],
        ['two', 'one', 'two', 'two', 'one']]
df = pd.DataFrame(data, columns=['节点一', '节点二', '节点三', '节点四', '节点五'])
est = HillClimbSearch(df)
best_model = est.estimate(scoring_method='k2score')
edges = best_model.edges()

bn = BayesianNetwork(edges)
bn.fit(df, estimator=BayesianEstimator)
bn.save('ci_demo1.bif')

model = BIFReader('ci_demo1.bif')

Expected behaviour

Actual behaviour

Tell us what happens instead

@Zhu-811 Sorry for the late reply. The Chinese character issue with the BIF Reader should be fixed now in the dev branch (Installation instruction: https://pgmpy.org/started/install.html).

In [22]: data1 = [['one', 'two', 'one', 'three', 'one'],
    ...:         ['three', 'one', 'two', 'two', 'three'],
    ...:         ['two', 'one', 'two', 'two', 'one']]
    ...: df = pd.DataFrame(data, columns=['节点一', '节点二', '节点三', '节点四', '节点五'])
    ...: est = HillClimbSearch(df)
    ...: best_model = est.estimate(scoring_method='k2score')
    ...: edges = best_model.edges()
    ...: 
    ...: bn = BayesianNetwork(edges)
    ...: bn.fit(df, estimator=BayesianEstimator)
    ...: bn.save('ci_demo1.bif')
    ...: 
    ...: model = BayesianNetwork.load('ci_demo1.bif')
  0%|                                                    | 6/1000000 [00:00<2:53:38, 95.98it/s]
  1. The estimate_ate currently only works for continuous datasets. For estimating ATE between two variables we need to go through every path connecting these two variables (except confounding paths), and then combine (by multiplying or summing) the path coefficient of each edge on these paths. This method only works for combining regression coefficients, and hence we can also do this for continuous variables.