RoaringBitmap / CRoaring

Roaring bitmaps in C (and C++), with SIMD (AVX2, AVX-512 and NEON) optimizations: used by Apache Doris, ClickHouse, and StarRocks

Home Page:http://roaringbitmap.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improve lazyor in CRoaring

baibaichen opened this issue · comments

I am currently working in a case where I need union multiple serialized roaring bitmaps. I found two points we can learn from the java version.

  1. This PR, in which naivelazyor is introduced, but I didn't find corresponding implementation in the C version. Is this due to negligence or some other technical reason? By the way, why does naivelazyor alwasy use bitset?
  2. Surprisingly, I found memory freeing to be a hotspot in my case, guessing because deserialization creates a lot of containers. I believe directly working on serialized bitamp would be a solution like ImmutableRoaringBitmap.

Any suggestion?

This RoaringBitmap/RoaringBitmap#118, in which naivelazyor is introduced, but I didn't find corresponding implementation in the C version.

There are several implementations of Roaring bitmaps, in C, Rust, Go, Java... they have different optimizations. In CRoaring, we are always open to better optimizations. Pull request invited !!!

Surprisingly, I found memory freeing to be a hotspot in my case, guessing because deserialization creates a lot of containers. I believe directly working on serialized bitamp would be a solution like ImmutableRoaringBitmap.

Java has the notion of buffers, which can be used to arbitrarily memory map any region. This functionality does not exist in C or most other languages. It can be implemented, of course. Advanced users in CRoaring can use the frozen format. The Java approach would also be possible in the C, but requires some work. You contribution is invited.