pgmpy / pgmpy

Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks.

Home Page:https://pgmpy.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error calculating query, IndexError

zSavT opened this issue · comments

commented

Error calculating query, IndexError

Describe your issue here.

Your environment

  • pgmpy version 0.1.19
  • Python version 3.10
  • Windows 11

Steps to reproduce

# Converting all values within the dataframe to integers
df_smoke_int = np.array(df_smoke, dtype=int)
df_smoke = pd.DataFrame(df_smoke_int, columns=df_smoke.columns)

# Creation of X feature and target y
X_train = df_smoke

# Creation of the network structure
k2 = K2Score(X_train)
hc_k2 = HillClimbSearch(X_train)
k2_model = hc_k2.estimate(scoring_method=k2)

# Creation of the network
bNet = BayesianNetwork(k2_model.edges())
bNet.fit(df_smoke, estimator=MaximumLikelihoodEstimator)
data = VariableElimination(bNet)  # inference

# Potential non-smoker subject
notSmoker = data.query(variables=['smoking'],
                       evidence={'age': 29, 'height(cm)': 170, 'weight(kg)': 60, 'Gtp': 31, 'triglyceride': 113, 'LDL': 116, 'systolic': 102, 'relaxation': 71,
                                 'HDL': 103, 'hemoglobin': 13, 'serum creatinine': 2, 'tartar': 0})

Error message


C:\Users\verio\repo\icon_project\pythonProject\venv\lib\site-packages\pgmpy\factors\discrete\DiscreteFactor.py:540: UserWarning: Found unknown state name. Trying to switch to using all state names as state numbers
  warnings.warn(
Traceback (most recent call last):
  File "C:\Users\verio\repo\icon_project\pythonProject\main.py", line 315, in <module>
    notSmoker = data.query(variables=['smoking'],
  File "C:\Users\verio\repo\icon_project\pythonProject\venv\lib\site-packages\pgmpy\inference\ExactInference.py", line 305, in query
    result = self._variable_elimination(
  File "C:\Users\verio\repo\icon_project\pythonProject\venv\lib\site-packages\pgmpy\inference\ExactInference.py", line 176, in _variable_elimination
    working_factors = self._get_working_factors(evidence)
  File "C:\Users\verio\repo\icon_project\pythonProject\venv\lib\site-packages\pgmpy\inference\ExactInference.py", line 46, in _get_working_factors
    factor_reduced = factor.reduce(
  File "C:\Users\verio\repo\icon_project\pythonProject\venv\lib\site-packages\pgmpy\factors\discrete\DiscreteFactor.py", line 560, in reduce
    phi.values = phi.values[tuple(slice_)]
IndexError: index 29 is out of bounds for axis 0 with size 14

Expected behaviour

Be able to enter every possible value.

Actual behaviour

Some values, such as in this case "29" are not accepted, this also happens with other variables with other values.

dataset

N.B.
I’m a beginner, I’m not 100% sure it’s a library problem, it can be my mistake.

@zSavT The problem in this case is that the training data doesn't have some of the values that you are passing as evidence. For example, the age variable doesn't have the value 29 in the training dataset:

In [14]: df.age.unique()
Out[14]: array([40, 55, 30, 45, 50, 35, 60, 25, 65, 20, 80, 75, 70, 85])

Because of this, pgmpy doesn't generate probability values for the state 29 and hence the error. You can provide these additional missing states to the fit function using the state_names argument while learning the parameters. For example if you would like to include a state 29 for age:

bNet.fit(df_smoke, estimator=MaximumLikelihoodEstimator, state_names={'age': list(df_smoke.age.unique()) + [29]})
commented

@zSavT The problem in this case is that the training data doesn't have some of the values that you are passing as evidence. For example, the age variable doesn't have the value 29 in the training dataset:

In [14]: df.age.unique()
Out[14]: array([40, 55, 30, 45, 50, 35, 60, 25, 65, 20, 80, 75, 70, 85])

Because of this, pgmpy doesn't generate probability values for the state 29 and hence the error. You can provide these additional missing states to the fit function using the state_names argument while learning the parameters. For example if you would like to include a state 29 for age:

bNet.fit(df_smoke, estimator=MaximumLikelihoodEstimator, state_names={'age': list(df_smoke.age.unique()) + [29]})

Ok, thank you very much!