pwwang / datar

A Grammar of Data Manipulation in python

Home Page:https://pwwang.github.io/datar/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

summarize/group_by with categoricals sometimes throws incompatible aggregated result

ftobin opened this issue · comments

The following will throw a ValueError

mtcars = datar.datasets.mtcars
(mtcars
   >> mutate(cyl = as_factor(f.cyl))
   >> group_by(f.cyl, f.gear)
   >> summarize(myx = sum_(f.disp*f.hp), _groups="drop"))
ValueError: `myx` is an incompatible aggregated result.

But this will not (the only difference is the grouping)

(mtcars
   >> mutate(cyl = as_factor(f.cyl))
   >> group_by(f.cyl, f.am)
   >> summarize(myx = sum_(f.disp*f.hp), _groups="drop"))

I have no idea why one would fail but not the other. Maybe something to do with the created groups having all the categorical values or not.

Note: Fails with mutate too, not just summarize.

ValueError: Incompatible value to recycle.

This is because f.disp * f.hp generates a result like:

cyl  gear
6    3        52005.0
     4        76429.6
     5        25375.0
4    3        11649.7
     4        64670.5
     5        21693.6
8    3       846042.0
     4            0.0
     5       193499.0
Name: x, dtype: float64

However, the index (8, 4) shouldn't be there.