Error calculating query, IndexError
zSavT opened this issue · comments
Error calculating query, IndexError
Describe your issue here.
Your environment
- pgmpy version 0.1.19
- Python version 3.10
- Windows 11
Steps to reproduce
# Converting all values within the dataframe to integers
df_smoke_int = np.array(df_smoke, dtype=int)
df_smoke = pd.DataFrame(df_smoke_int, columns=df_smoke.columns)
# Creation of X feature and target y
X_train = df_smoke
# Creation of the network structure
k2 = K2Score(X_train)
hc_k2 = HillClimbSearch(X_train)
k2_model = hc_k2.estimate(scoring_method=k2)
# Creation of the network
bNet = BayesianNetwork(k2_model.edges())
bNet.fit(df_smoke, estimator=MaximumLikelihoodEstimator)
data = VariableElimination(bNet) # inference
# Potential non-smoker subject
notSmoker = data.query(variables=['smoking'],
evidence={'age': 29, 'height(cm)': 170, 'weight(kg)': 60, 'Gtp': 31, 'triglyceride': 113, 'LDL': 116, 'systolic': 102, 'relaxation': 71,
'HDL': 103, 'hemoglobin': 13, 'serum creatinine': 2, 'tartar': 0})
Error message
C:\Users\verio\repo\icon_project\pythonProject\venv\lib\site-packages\pgmpy\factors\discrete\DiscreteFactor.py:540: UserWarning: Found unknown state name. Trying to switch to using all state names as state numbers
warnings.warn(
Traceback (most recent call last):
File "C:\Users\verio\repo\icon_project\pythonProject\main.py", line 315, in <module>
notSmoker = data.query(variables=['smoking'],
File "C:\Users\verio\repo\icon_project\pythonProject\venv\lib\site-packages\pgmpy\inference\ExactInference.py", line 305, in query
result = self._variable_elimination(
File "C:\Users\verio\repo\icon_project\pythonProject\venv\lib\site-packages\pgmpy\inference\ExactInference.py", line 176, in _variable_elimination
working_factors = self._get_working_factors(evidence)
File "C:\Users\verio\repo\icon_project\pythonProject\venv\lib\site-packages\pgmpy\inference\ExactInference.py", line 46, in _get_working_factors
factor_reduced = factor.reduce(
File "C:\Users\verio\repo\icon_project\pythonProject\venv\lib\site-packages\pgmpy\factors\discrete\DiscreteFactor.py", line 560, in reduce
phi.values = phi.values[tuple(slice_)]
IndexError: index 29 is out of bounds for axis 0 with size 14
Expected behaviour
Be able to enter every possible value.
Actual behaviour
Some values, such as in this case "29" are not accepted, this also happens with other variables with other values.
N.B.
I’m a beginner, I’m not 100% sure it’s a library problem, it can be my mistake.
@zSavT The problem in this case is that the training data doesn't have some of the values that you are passing as evidence. For example, the age
variable doesn't have the value 29 in the training dataset:
In [14]: df.age.unique()
Out[14]: array([40, 55, 30, 45, 50, 35, 60, 25, 65, 20, 80, 75, 70, 85])
Because of this, pgmpy doesn't generate probability values for the state 29
and hence the error. You can provide these additional missing states to the fit function using the state_names
argument while learning the parameters. For example if you would like to include a state 29
for age:
bNet.fit(df_smoke, estimator=MaximumLikelihoodEstimator, state_names={'age': list(df_smoke.age.unique()) + [29]})
@zSavT The problem in this case is that the training data doesn't have some of the values that you are passing as evidence. For example, the
age
variable doesn't have the value 29 in the training dataset:In [14]: df.age.unique() Out[14]: array([40, 55, 30, 45, 50, 35, 60, 25, 65, 20, 80, 75, 70, 85])Because of this, pgmpy doesn't generate probability values for the state
29
and hence the error. You can provide these additional missing states to the fit function using thestate_names
argument while learning the parameters. For example if you would like to include a state29
for age:bNet.fit(df_smoke, estimator=MaximumLikelihoodEstimator, state_names={'age': list(df_smoke.age.unique()) + [29]})
Ok, thank you very much!