Shape mismatch when there are two columns and the aggfunc is count or sum
mahaalbashir opened this issue · comments
table= acro.crosstab(df.year, [df.grant_type, df.survivor], values=df.inc_grants, aggfunc="aggfunc", margins=True )
This command produces the error ValueError: Array conditional must be same shape as self
. The test for this case is the function test_crosstab_with_sum
in the test-initial.
Explanation
When the pd.crosstab is used if the aggfunc is:
- Sum or count
columns with zeros are not deleted - Mean or std
columns with zeros are deleted
The threshold mask is created using the count function which by default (the pandas version) doesn’t delete a column if it is all zeros. So the threshold originally is created with the columns that have zeros in all cells. After the creation of the threshold mask, every column that is all zeros is deleted.
The p-ratio and the nk-rule and p-ratio masks by default don’t show columns with zeros. When run the command:
- If the agg function is mean or std the resulting table is without columns with zeros, so the masks and the table are the same shape.
- If the agg function is sum or count the resulting table is with columns with zeros, so the masks and the tables are not the same shape.
I think the above explanation is not so accurate. I tested the same scenario with different columns and while this command
table= acro.crosstab(df.year, [df.grant_type, df.survivor], values=df.inc_grants, aggfunc="aggfunc", margins=True )
doesn't work and throw an error, replacing the second column with df.status instead of df.survivor
table= acro.crosstab(df.year, [df.grant_type, df.status], values=df.inc_grants, aggfunc="aggfunc", margins=True )
seems to work fine.
The suggested solution, deleting the columns with zeros from the table before applying the masks, will work fine and solve the issue, but I am not sure why different columns resulted in different behavior. @jim-smith