apache / datasketches-java

A software library of stochastic streaming algorithms, a.k.a. sketches.

Home Page:https://datasketches.apache.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is there any way to convert CompactSketch to UpdateSketch?

yunfan123 opened this issue · comments

I want to transfer CompactSketch to UpdateSketch

It is not possible. The compact form does not have sufficient information for that to be possible.

But you can union a CompactSketch with any other theta sketch, with no accuracy loss versus creating it all via a single sketch, so you can use that approach to add additional items to a compact sketch. But it cannot be done directly.

OK. And is there any way to union two updateSketch to one updateSketch?
It seems it only support union two updateSketch to a compact sketch.

@yunfan123

The Union is a streaming operator and is designed for high performance in unioning a stream of sketches where the stream can be many thousands or millions of sketches. It is also designed to accept many different "flavors" of the base Theta Sketch: AlphaSketch QuickSelectSketch, CompactSketch, DirectQuickSelectSketch, DirectCompactSketch, and binary Memory images of the above variants. And to do all that and with high performance, we use some optimization techniques that allow internals of the Union to be in a non-finalized state in-between updates. It is the getResult() methods that performs the finalization. The only type of sketch that is mathematically correct to return, given all the different combinations of inputs is the CompactSketch.

You can always do what @jmalkin suggested above, or you can also do "item" updates to the Union sketch, where it behaves similarly to the UpdateSketch and then do a getResult() at the end.

If your situation is such that you are always just unioning two sketches together, please recognize that your speed performance will not be optimum.