how do I efficiently query for unique values of a field
ami-m opened this issue · comments
Ami Malimovka commented
say I get a stream of data: {machineCode: "", lat: , lon: }
And I want to display a count of such datums per machineCode.
Is there a way to efficiently get all the unique machine codes? or should I just keep track of them while inserting data?
Roman Atachiants commented
No built-in feature in column for this, but there's 2 ways I can think of to solve this problem:
- if you're okay with imprecise measurement, use HyperLogLog to store machine codes
- otherwise, a standard map/set is required
You can do both during insertion or a range query that iterates over all elements.
Ami Malimovka commented
thanks, I went with the second method, but that leaves me with having to do the range query when restoring state from a snapshot :-(