kelindar / column

High-performance, columnar, in-memory store with bitmap indexing in Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how do I efficiently query for unique values of a field

ami-m opened this issue · comments

say I get a stream of data: {machineCode: "", lat: , lon: }
And I want to display a count of such datums per machineCode.

Is there a way to efficiently get all the unique machine codes? or should I just keep track of them while inserting data?

No built-in feature in column for this, but there's 2 ways I can think of to solve this problem:

  1. if you're okay with imprecise measurement, use HyperLogLog to store machine codes
  2. otherwise, a standard map/set is required

You can do both during insertion or a range query that iterates over all elements.

thanks, I went with the second method, but that leaves me with having to do the range query when restoring state from a snapshot :-(