RoaringBitmap / CRoaring

Roaring bitmaps in C (and C++), with SIMD (AVX2, AVX-512 and NEON) optimizations: used by Apache Doris, ClickHouse, and StarRocks

Home Page:http://roaringbitmap.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implement roaring64_bitmap_statistics

AviAvni opened this issue · comments

roaring_bitmap_statistics exists can we get it also for roaring64

@AviAvni Would you be interested in proposing a pull request?

@lemire tried it #617 please help with review

is the sum of n_bytes_array_containers n_bytes_run_containers n_bytes_bitset_containers is the total bytes allocated or there are more to consider?

@Dr-Emann provided a review. :-)

@lemire and about my question about how much memory allocated?

about my question about how much memory allocated?

Are you asking about what these numbers mean in the 32-bit statistic instances ?

    uint32_t n_bytes_array_containers;  /* number of allocated bytes in array
                                           containers */
    uint32_t n_bytes_run_containers;    /* number of allocated bytes in run
                                           containers */
    uint32_t n_bytes_bitset_containers; /* number of allocated bytes in  bitmap
                                           containers */

Or do you mean to refer to something else?

is the sum of n_bytes_array_containers n_bytes_run_containers n_bytes_bitset_containers is the total bytes allocated or there are more to consider?

Do you mean to refer to this...

size_t roaring64_bitmap_portable_size_in_bytes(const roaring64_bitmap_t *r);

It is somewhat difficult to ask, in absolute terms, how much memory a piece of code uses. You need to count the number of pages actually accessed. It is best done using client code. That is not a service we should provide.

What we can provide is useful internal statistics.

I'm asking if this 3 uint32_t is the overall heap memory allocated for roaring bitmap or there is more memory that allocated that is not on the statistics

I'm asking if this 3 uint32_t is the overall heap memory allocated for roaring bitmap or there is more memory that allocated that is not on the statistics

I think I have answered this but let me clarify just in case in this not yet clear:

  • n_bytes_array_containers is the number of allocated bytes in array containers
  • n_bytes_run_containers is the number of allocated bytes in run containers
  • n_bytes_bitset_containers is the number of allocated bytes in bitmap containers

This can be both less (yes, less) or more than the total heap memory usage that can be attributed to the Roaring bitmap you are holding.

These functions do not have as a purpose to compute the memory usage of the bitmap. That is not their purpose.

For example, you might be using copy-on-write and have a seemingly enormous bitmap that uses no memory at all. Then it matters whether you have called roaring_bitmap_shrink_to_fit or not. And so forth. It is beyond the scope of these statistic functions to, for example, to measure the number of memory pages touched and allocated by data from this bitmap. It is best done at the application level.

And new char[1] does not use 1 byte on the heap. How much it uses is hard to tell in the abstract.