Consider using ml-bucketeer for grouping constraint

Question

Consider using ml-bucketeer for grouping constraint

grtjn opened this issue 9 years ago · comments

Using a UDF can highly improve performance getting aggregates from data. Current grouping implementation can get slow if the number of groups get large, particularly if dataset is large, and there are lot of unique values. Leveraging a UDF can potentially boost performance in that case:

https://github.com/ryanjdew/ml-bucketeer