apache / datasketches-java

A software library of stochastic streaming algorithms, a.k.a. sketches.

Home Page:https://datasketches.apache.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NPE when trying to grow the buffer in DoublesSketch

jihoonson opened this issue · comments

Hi, Druid uses the QuantilesSketch to compute approximate quantiles. To keep the approximate quantiles, Druid creates a DoublesUnion which is backed by a WriableMemory wrapping a DirectByteBuffer. Since Druid does not know how many items could be there in advance but the buffer size should be fixed, it estimates the initial size of Memory to be large enough to hold one billion items. The below code snippet shows this and can be found in https://github.com/apache/druid/blob/master/extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/quantiles/DoublesSketchMergeBufferAggregatorHelper.java#L47-L53.

  public void init(final ByteBuffer buffer, final int position)
  {
    final WritableMemory mem = getMemory(buffer);
    final WritableMemory region = mem.writableRegion(position, maxIntermediateSize);
    final DoublesUnion union = DoublesUnion.builder().setMaxK(k).build(region);
    putUnion(buffer, position, union);
  }

This is causing a problem that DirectUpdateDoublesSketch throws NPE at this line when there are actually more than one billion items added in the union (reported in apache/druid#11544). DirectUpdateDoublesSketch tried to allocate extra memory to hold more items than that was initially estimated, but WritableMemory in it was BBWritableMemoryImpl (because it was created by wrapping DirectByteBuffer) which returns null in getMemoryRequestServer(). I thought that the fix could be returning a valid memoryRequestServer for BBWritableMemoryImpl, but the Javadoc of WriableMemory.getMemoryRequestServer states that this method is supposed to return null for non-direct memory. So, I was not sure what the right fix would be. What is the reason for non-direct memory to return null in getMemoryRequestServer()?

Reference: apache/druid#11544

We can reproduce this bug and thank you for reporting it. As there may be several ways to fix this, we need to consider what the best way would be. We are about to release a new Memory component and we may choose to implement the fix with the new release. Also, we will need to release a new datasketches-java component as well to take advantage of this fix. Doing two releases will take some time.

In the meantime, one possible workaround would be for you to allocate a larger ByteBuffer chunk for the sketch as Alex has suggested.

If we can figure out a simple hack, we might be able to supply you with a temporary hacked jar, if you can accommodate that, we will let you know if that is feasible.

This bug should now be fixed in the latest Memory 2.0.0 release. We are in the process of releasing DataSketches-java 3.0.0. Once this is released, the DataSketches Druid adaptor will need to be updated and then incorporated in a new Druid release...
(please be patient!).
Cheers,
Lee.

@leerho thanks for the update. Should we update DataSketches-java and DataSketches-memory at the same time? Also, out of curiosity, can you point me to the commit or the PR that the bug has fixed in? (Is it apache/datasketches-memory#139?) Just wondering what the fix is.

The actual fix was rather complex and consists of several commits and PRs 136, 137 and 138. The key commit for the fix in PR136 was:

And the explanation of what happened is in this commit in PR138 with the confirming reproducing test:

To understand the details of the fix will require a rather deep understanding of the architecture of ds-memory. Good luck :)


Yes, ds-java depends on ds-memory so you will need to upgrade them both at the same time when they are both available. Do not upgrade to ds-memory 2.0 until you have the ds-java release 3.0, because there are some internal changes to ds-java to take advantage of the new ds-memory.

Thanks for the pointers. I will take a look at those commits 🙂. I will also bump up the versions of DataSketches-java and memory when 3.0.0 is released. I will close this issue since the bug is technically fixed in the memory 2.0.0 release. Thanks for fixing this bug!