Error loading Bayesian model
Zhu-811 opened this issue · comments
inner peace commented
Subject of the issue
1. Using BIFReader to load a Bayesian network containing Chinese name nodes gives an error as follows:
mm = BIFReader('ci_demo1.bif')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/pgmpy/readwrite/BIF.py", line 96, in __init__
self.variable_names = self.get_variables()
File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/pgmpy/readwrite/BIF.py", line 206, in get_variables
name = self.name_expr.searchString(block)[0][0]
File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/pyparsing/results.py", line 193, in __getitem__
return self._toklist[i]
IndexError: list index out of range
2. When the value of the node is a string, use infer.estimate_ate to report an error as follows:
df
A B C D E
0 one two one three one
1 three one two two three
2 two one two two one`
infer.estimate_ate(edge[0], edge[1], data=df)
`Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/pgmpy/inference/CausalInference.py", line 391, in estimate_ate
ate = [
File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/pgmpy/inference/CausalInference.py", line 392, in <listcomp>
self.estimator.fit(X=x1, Y=x2, Z=s, data=data, **kwargs)._get_ate()
File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/pgmpy/estimators/LinearModel.py", line 25, in fit
self.estimator = self._model(X, Y, Z, data, **kwargs).fit()
File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/pgmpy/estimators/LinearModel.py", line 22, in _model
return self.estimator(endog=endog, exog=exog, **kwargs)
File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/statsmodels/regression/linear_model.py", line 922, in __init__
super(OLS, self).__init__(endog, exog, missing=missing,
File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/statsmodels/regression/linear_model.py", line 748, in __init__
super(WLS, self).__init__(endog, exog, missing=missing,
File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/statsmodels/regression/linear_model.py", line 202, in __init__
super(RegressionModel, self).__init__(endog, exog, **kwargs)
File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/statsmodels/base/model.py", line 270, in __init__
super().__init__(endog, exog, **kwargs)
File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/statsmodels/base/model.py", line 95, in __init__
self.data = self._handle_data(endog, exog, missing, hasconst,
File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/statsmodels/base/model.py", line 135, in _handle_data
data = handle_data(endog, exog, missing, hasconst, **kwargs)
File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/statsmodels/base/data.py", line 675, in handle_data
return klass(endog, exog=exog, missing=missing, hasconst=hasconst,
File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/statsmodels/base/data.py", line 84, in __init__
self.endog, self.exog = self._convert_endog_exog(endog, exog)
File "/home/slht/miniconda3/envs/p112p39/lib/python3.9/site-packages/statsmodels/base/data.py", line 509, in _convert_endog_exog
raise ValueError("Pandas data cast to numpy dtype of object. "
ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).
Your environment
- pgmpy 0.1.23
- Python 3.9.17
- Operating Ununtu 18.04
Steps to reproduce
data1 = [['one', 'two', 'one', 'three', 'one'],
['three', 'one', 'two', 'two', 'three'],
['two', 'one', 'two', 'two', 'one']]
df = pd.DataFrame(data, columns=['节点一', '节点二', '节点三', '节点四', '节点五'])
est = HillClimbSearch(df)
best_model = est.estimate(scoring_method='k2score')
edges = best_model.edges()
bn = BayesianNetwork(edges)
bn.fit(df, estimator=BayesianEstimator)
bn.save('ci_demo1.bif')
model = BIFReader('ci_demo1.bif')
Expected behaviour
Actual behaviour
Tell us what happens instead
Ankur Ankan commented
@Zhu-811 Sorry for the late reply. The Chinese character issue with the BIF Reader should be fixed now in the dev branch (Installation instruction: https://pgmpy.org/started/install.html).
In [22]: data1 = [['one', 'two', 'one', 'three', 'one'],
...: ['three', 'one', 'two', 'two', 'three'],
...: ['two', 'one', 'two', 'two', 'one']]
...: df = pd.DataFrame(data, columns=['节点一', '节点二', '节点三', '节点四', '节点五'])
...: est = HillClimbSearch(df)
...: best_model = est.estimate(scoring_method='k2score')
...: edges = best_model.edges()
...:
...: bn = BayesianNetwork(edges)
...: bn.fit(df, estimator=BayesianEstimator)
...: bn.save('ci_demo1.bif')
...:
...: model = BayesianNetwork.load('ci_demo1.bif')
0%| | 6/1000000 [00:00<2:53:38, 95.98it/s]
- The
estimate_ate
currently only works for continuous datasets. For estimating ATE between two variables we need to go through every path connecting these two variables (except confounding paths), and then combine (by multiplying or summing) the path coefficient of each edge on these paths. This method only works for combining regression coefficients, and hence we can also do this for continuous variables.