Consider using ml-bucketeer for grouping constraint
grtjn opened this issue · comments
Using a UDF can highly improve performance getting aggregates from data. Current grouping implementation can get slow if the number of groups get large, particularly if dataset is large, and there are lot of unique values. Leveraging a UDF can potentially boost performance in that case: