Pandas 1.5 issue for chisquare test.
robertness opened this issue · comments
Subject of the issue
Running chi_square fails locally on dataframes from pandas 1.5. Worked again when using pandas 1.4.
This was found by one of my students on a long standing coding assignment. The student suspects that the issue is how pgmpy's power_divergence
method iterates through a pandas groupby and assumes that a tuple will be returned when doing so - but pandas 1.5 changed the output to be length 1, so this line fails.
Your environment
They student got the error locally, using pandas>=1.5 and pgmpy==0.1.21. I'll ping them to update this issue with their local Python and OS.
Steps to reproduce
test_result = chi_square(X=X, Y=Y, Z=Z, data=data, boolean=True, significance_level=significance)
Expected behaviour
Get a Boolean output.
Actual behaviour
Error.
@robertness Thanks for reporting the issue, but I don't seem to be able to reproduce the error on pandas 1.5.2. Could you possibly ask your student if they could share some reproducible code where they got the error? Here's my test script where it seems to work fine:
In [16]: import pandas as pd
In [17]: from pgmpy.utils import get_example_model
In [18]: from pgmpy.estimators.CITests import chi_square
In [19]: pd.__version__
Out[19]: '1.5.2'
In [20]: model = get_example_model('asia')
In [21]: data = model.simulate()
Generating for node: dysp: 100%|█████████████████████████████████████████| 8/8 [00:00<00:00, 1676.05it/s]
In [22]: chi_square('asia', 'smoke', ['tub', 'either', 'dysp'], data, significance_level=0.05)
Out[22]: False
@robertness Thanks for reporting the issue, but I don't seem to be able to reproduce the error on pandas 1.5.2. Could you possibly ask your student if they could share some reproducible code where they got the error? Here's my test script where it seems to work fine:
In [16]: import pandas as pd In [17]: from pgmpy.utils import get_example_model In [18]: from pgmpy.estimators.CITests import chi_square In [19]: pd.__version__ Out[19]: '1.5.2' In [20]: model = get_example_model('asia') In [21]: data = model.simulate() Generating for node: dysp: 100%|█████████████████████████████████████████| 8/8 [00:00<00:00, 1676.05it/s] In [22]: chi_square('asia', 'smoke', ['tub', 'either', 'dysp'], data, significance_level=0.05) Out[22]: False
Hey @ankurankan, after reinstalling pandas I'm failing to see the same error. Not sure what changed on my end as I originally thought I had tested on 1.5,1.5.1, and 1.5.2, but they all seem to be working fine for me now (just with a regular futurewarning). Here's the code I was using to test below
from pgmpy.estimators.CITests import chi_square
import numpy as np
import pandas as pd
import pgmpy
chi_square(
X="X",
Y="Y",
Z=["Z"],
data=pd.DataFrame({
"X":[1.5, 1.2, 1.4],
"Y":[3.1, 2.3, 1.4],
"Z":[1,1,0]
}),
significance_level=0.05
)
apologies on this one, looks like everything is working fine!
@kylejcaron Thanks for confirming. Closing for now.