rapidsai / node

GPU-accelerated data science and visualization in node

Home Page:https://rapidsai.github.io/node/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FEA] Add multi-aggregation support to DataFrame groupBy

trxcllnt opened this issue · comments

Users should be able to do multiple aggregations on a GroupBy:

const df = DataFrame({
  a: Series.new([0, 0]),
  b: Series.new([0, 100]),
  c: Series.new([0, 100]),
});

df.groupBy({by: "a"}).aggregate({ min: ["b", "c"], max: "c" });

// a | b_min | c_min | c_max
// 0 |     0 |     0 |   100

After some discussion, it was decided that it would be better to have users supply their own column names up front:

const df = DataFrame({
  a: Series.new([0, 0]),
  b: Series.new([0, 100]),
  c: Series.new([0, 100]),
});

df.groupBy({by: "a"}).aggregate({
  b: { min: "min_b", max: "foobar" },
  c: { count: "stuff" },
})

// a | min_b | foobar | stuff
// 0 |     0 |    100 |     2

After type experiment problems with the above, latest API attempt will swap aggregations to the outermost keys:

const df = DataFrame({
  a: Series.new([0, 0]),
  b: Series.new([0, 100]),
  c: Series.new([0, 100]),
});

df.groupBy({by: "a"}).aggregate({
  min: { b: "min_b", c: "stuff" },
  max: { b: "foobar" },
})

// a | min_b | foobar | stuff
// 0 |     0 |    100 |     0