h2oai / datatable

A Python package for manipulating 2-dimensional tabular data structures

Home Page:https://datatable.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incorrect output for nested reducers

samukweku opened this issue · comments

Nesting aggregation calls sometimes produces incorrect results

>>> from datatable import dt, f
>>> DT = dt.Frame([1, 2])
>>> DT[:, dt.mean(dt.sum(f.C0))]
   |         C0
   |    float64
-- + ----------
 0 | 2.8823e+18
[1 row x 1 column]
>>> DT[:, dt.sum(dt.mean(f.C0))]
   |          C0
   |     float64
-- + -----------
 0 | 2.31584e+77
[1 row x 1 column]

The expected behavior is

>>> DT[:, dt.mean(dt.sum(f.C0))]
   |    C0
   | float64
-- + -----
 0 |     3
[1 row x 1 column]
>>> DT[:, dt.sum(dt.mean(f.C0))]
   |      C0
   | float64
-- + -------
 0 |     1.5
[1 row x 1 column]

datatable version: 1.1
python version: 3.9
operating system: linux

Another example, this time for countna():

>>> DT = dt.Frame(G=[1,1,1,2,2,2], V=[None, None, None, None, 3, 5])
>>> DT[:, dt.countna(dt.mean(f.V)), dt.by(f.G)] # wrong output
   |     G      V
   | int32  int64
-- + -----  -----
 0 |     1      3
 1 |     2      0
[2 rows x 2 columns]

Expected output

   |     G      V
   | int32  int64
-- + -----  -----
 0 |     1      1
 1 |     2      0
[2 rows x 2 columns]

This has been resolved for FExprs. Once all the reducers are converted from Expr to FExpr the problem should be gone

closing this as nested reduction is fixed:

DT[:, dt.mean(dt.sum(f.C0))]
Out[3]: 
   |      C0
   | float64
-- + -------
 0 |       3
[1 row x 1 column]

It is only fixed for FExprs and still persists for Exprs, I guess it should only be closed once we get rid of Exprs.