Incorrect output for nested reducers

Question

Incorrect output for nested reducers

samukweku opened this issue a year ago · comments

Nesting aggregation calls sometimes produces incorrect results

>>> from datatable import dt, f
>>> DT = dt.Frame([1, 2])
>>> DT[:, dt.mean(dt.sum(f.C0))]
   |         C0
   |    float64
-- + ----------
 0 | 2.8823e+18
[1 row x 1 column]
>>> DT[:, dt.sum(dt.mean(f.C0))]
   |          C0
   |     float64
-- + -----------
 0 | 2.31584e+77
[1 row x 1 column]

The expected behavior is

>>> DT[:, dt.mean(dt.sum(f.C0))]
   |    C0
   | float64
-- + -----
 0 |     3
[1 row x 1 column]
>>> DT[:, dt.sum(dt.mean(f.C0))]
   |      C0
   | float64
-- + -------
 0 |     1.5
[1 row x 1 column]

datatable version: 1.1
python version: 3.9
operating system: linux

Samuel Oranyeli · Answer 1 · Tue Mar 14 2023 16:36:35 GMT+0800 (China Standard Time)

Another example, this time for countna():

>>> DT = dt.Frame(G=[1,1,1,2,2,2], V=[None, None, None, None, 3, 5])
>>> DT[:, dt.countna(dt.mean(f.V)), dt.by(f.G)] # wrong output
   |     G      V
   | int32  int64
-- + -----  -----
 0 |     1      3
 1 |     2      0
[2 rows x 2 columns]

Expected output

   |     G      V
   | int32  int64
-- + -----  -----
 0 |     1      1
 1 |     2      0
[2 rows x 2 columns]

Oleksiy · Answer 2 · Wed Mar 22 2023 06:13:07 GMT+0800 (China Standard Time)

This has been resolved for FExprs. Once all the reducers are converted from Expr to FExpr the problem should be gone

Samuel Oranyeli · Answer 3 · Mon Apr 24 2023 10:18:33 GMT+0800 (China Standard Time)

closing this as nested reduction is fixed:

DT[:, dt.mean(dt.sum(f.C0))]
Out[3]: 
   |      C0
   | float64
-- + -------
 0 |       3
[1 row x 1 column]

Oleksiy · Answer 4 · Mon Apr 24 2023 11:24:14 GMT+0800 (China Standard Time)

It is only fixed for FExprs and still persists for Exprs, I guess it should only be closed once we get rid of Exprs.