pgmpy / pgmpy

Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks.

Home Page:https://pgmpy.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pandas 1.5 issue for chisquare test.

robertness opened this issue · comments

Subject of the issue

Running chi_square fails locally on dataframes from pandas 1.5. Worked again when using pandas 1.4.

This was found by one of my students on a long standing coding assignment. The student suspects that the issue is how pgmpy's power_divergence method iterates through a pandas groupby and assumes that a tuple will be returned when doing so - but pandas 1.5 changed the output to be length 1, so this line fails

Your environment

They student got the error locally, using pandas>=1.5 and pgmpy==0.1.21. I'll ping them to update this issue with their local Python and OS.

Steps to reproduce

test_result = chi_square(X=X, Y=Y, Z=Z, data=data, boolean=True, significance_level=significance)

Expected behaviour

Get a Boolean output.

Actual behaviour

Error.

@robertness Thanks for reporting the issue, but I don't seem to be able to reproduce the error on pandas 1.5.2. Could you possibly ask your student if they could share some reproducible code where they got the error? Here's my test script where it seems to work fine:

In [16]: import pandas as pd

In [17]: from pgmpy.utils import get_example_model

In [18]: from pgmpy.estimators.CITests import chi_square

In [19]: pd.__version__
Out[19]: '1.5.2'

In [20]: model = get_example_model('asia')

In [21]: data = model.simulate()
Generating for node: dysp: 100%|█████████████████████████████████████████| 8/8 [00:00<00:00, 1676.05it/s]

In [22]: chi_square('asia', 'smoke', ['tub', 'either', 'dysp'], data, significance_level=0.05)
Out[22]: False

@robertness Thanks for reporting the issue, but I don't seem to be able to reproduce the error on pandas 1.5.2. Could you possibly ask your student if they could share some reproducible code where they got the error? Here's my test script where it seems to work fine:

In [16]: import pandas as pd

In [17]: from pgmpy.utils import get_example_model

In [18]: from pgmpy.estimators.CITests import chi_square

In [19]: pd.__version__
Out[19]: '1.5.2'

In [20]: model = get_example_model('asia')

In [21]: data = model.simulate()
Generating for node: dysp: 100%|█████████████████████████████████████████| 8/8 [00:00<00:00, 1676.05it/s]

In [22]: chi_square('asia', 'smoke', ['tub', 'either', 'dysp'], data, significance_level=0.05)
Out[22]: False

Hey @ankurankan, after reinstalling pandas I'm failing to see the same error. Not sure what changed on my end as I originally thought I had tested on 1.5,1.5.1, and 1.5.2, but they all seem to be working fine for me now (just with a regular futurewarning). Here's the code I was using to test below

from pgmpy.estimators.CITests import chi_square
import numpy as np
import pandas as pd
import pgmpy 

chi_square(
    X="X",
    Y="Y",
    Z=["Z"],
    data=pd.DataFrame({
        "X":[1.5, 1.2, 1.4],
        "Y":[3.1, 2.3, 1.4],
        "Z":[1,1,0]
    }),
    significance_level=0.05
)

apologies on this one, looks like everything is working fine!

@kylejcaron Thanks for confirming. Closing for now.