Incorrect output for nested reducers
samukweku opened this issue · comments
Samuel Oranyeli commented
Nesting aggregation calls sometimes produces incorrect results
>>> from datatable import dt, f
>>> DT = dt.Frame([1, 2])
>>> DT[:, dt.mean(dt.sum(f.C0))]
| C0
| float64
-- + ----------
0 | 2.8823e+18
[1 row x 1 column]
>>> DT[:, dt.sum(dt.mean(f.C0))]
| C0
| float64
-- + -----------
0 | 2.31584e+77
[1 row x 1 column]
The expected behavior is
>>> DT[:, dt.mean(dt.sum(f.C0))]
| C0
| float64
-- + -----
0 | 3
[1 row x 1 column]
>>> DT[:, dt.sum(dt.mean(f.C0))]
| C0
| float64
-- + -------
0 | 1.5
[1 row x 1 column]
datatable version: 1.1
python version: 3.9
operating system: linux
Samuel Oranyeli commented
Another example, this time for countna()
:
>>> DT = dt.Frame(G=[1,1,1,2,2,2], V=[None, None, None, None, 3, 5])
>>> DT[:, dt.countna(dt.mean(f.V)), dt.by(f.G)] # wrong output
| G V
| int32 int64
-- + ----- -----
0 | 1 3
1 | 2 0
[2 rows x 2 columns]
Expected output
| G V
| int32 int64
-- + ----- -----
0 | 1 1
1 | 2 0
[2 rows x 2 columns]
Oleksiy commented
This has been resolved for FExpr
s. Once all the reducers are converted from Expr
to FExpr
the problem should be gone
Samuel Oranyeli commented
closing this as nested reduction is fixed:
DT[:, dt.mean(dt.sum(f.C0))]
Out[3]:
| C0
| float64
-- + -------
0 | 3
[1 row x 1 column]
Oleksiy commented
It is only fixed for FExprs and still persists for Exprs, I guess it should only be closed once we get rid of Exprs.