apache / datasketches-java

A software library of stochastic streaming algorithms, a.k.a. sketches.

Home Page:https://datasketches.apache.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[UpdateDoublesSketch] More heap allocation than `0.8.3`

shoothzj opened this issue · comments

Hi, I noticed that there's more heap allocation than 0.8.3. Below is the detail test data.

3.2.0

@Slf4j
@State(Scope.Benchmark)
public class UpdateDoubleSketchDemo {

    private final UpdateDoublesSketch doublesSketch = new DoublesSketchBuilder().build();

    @Benchmark
    public void benchMark() {
        doublesSketch.update(RandomUtil.randomDouble());
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(UpdateDoubleSketchDemo.class.getSimpleName())
                .addProfiler(GCProfiler.class)
                .forks(1)
                .build();

        new Runner(opt).run();
    }

}

it shows

Benchmark                                                       Mode  Cnt         Score        Error   Units
UpdateDoubleSketchDemo.benchMark                               thrpt    5  21829812.630 ± 832954.814   ops/s
UpdateDoubleSketchDemo.benchMark:·gc.alloc.rate                thrpt    5         8.269 ±      0.322  MB/sec
UpdateDoubleSketchDemo.benchMark:·gc.alloc.rate.norm           thrpt    5         0.417 ±      0.001    B/op
UpdateDoubleSketchDemo.benchMark:·gc.churn.G1_Eden_Space       thrpt    5        11.421 ±     98.334  MB/sec
UpdateDoubleSketchDemo.benchMark:·gc.churn.G1_Eden_Space.norm  thrpt    5         0.575 ±      4.953    B/op
UpdateDoubleSketchDemo.benchMark:·gc.count                     thrpt    5         1.000               counts
UpdateDoubleSketchDemo.benchMark:·gc.time                      thrpt    5         2.000                   ms

0.8.3

@State(Scope.Benchmark)
public class UpdateDoubleSketchDemo {

    private final DoublesSketch doublesSketch = new DoublesSketchBuilder().build();

    @Benchmark
    public void benchMark() {
        doublesSketch.update(RandomUtil.randomDouble());
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(UpdateDoubleSketchDemo.class.getSimpleName())
                .addProfiler(GCProfiler.class)
                .forks(1)
                .build();

        new Runner(opt).run();
    }

}

it shows

Benchmark                                              Mode  Cnt         Score        Error   Units
UpdateDoubleSketchDemo.benchMark                      thrpt    5  22132550.180 ± 695427.053   ops/s
UpdateDoubleSketchDemo.benchMark:·gc.alloc.rate       thrpt    5         0.001 ±      0.004  MB/sec
UpdateDoubleSketchDemo.benchMark:·gc.alloc.rate.norm  thrpt    5        ≈ 10⁻⁵                 B/op
UpdateDoubleSketchDemo.benchMark:·gc.count            thrpt    5           ≈ 0               counts

And the RandomUtil

public class RandomUtil {

    private static final Random RANDOM = new Random();

    public static double randomDouble() {
        return 0 + 500 * RANDOM.nextDouble();
    }

}

This doesn't surprise me. I expect it's a side-effect of sharing code between the heap and direct instances -- I don't even see a direct implementation in the code from 5.5 years ago.

In order to avoid a significant increase in nearly-identical code in order to allow various mixes of heap and direct sketches to be merged, we added a layer of indirection when reading them. That definitely imposed a performance cost, but gained significant flexibility without turning into a maintenance nightmare.

I'm not clear about exactly what metric you need to optimize for:

  • Are you concerned about overall heap allocation size per sketch -- and in what state?: live, i.e., updatable, or in a more compact form and not updatable. (The 0.8.3 version did not have a compact form, newer versions do).
  • Or are you concerned about heap allocation churn -- i.e., as the sketch is being updated, how much memory is constantly being allocated and then released (This is handled by the G.C. Eden space)

If your concern is about heap allocation size per sketch, then I suggest that you consider the newer KLL sketch which is considerably smaller for the same accuracy than the older "classic" quantiles sketch and has nearly the identical API.

If your concern is optimizing allocation churn, which would reduce load on the Eden space, there is not much I can say. This is not a property we normally optimize for, other than its impact on speed performance, which you have acknowledged has not significantly changed.

What we do try to optimize for is accuracy per amount of space used, especially when in compact (immutable) form, and speed of updating and merging.

Nonetheless, I would strongly encourage you to update to the latest release as you will benefit from bug fixes and other capabilities that didn't exist 5.5 years ago. Also, the version you are using is not only very old, it is prior to our move to Apache. We are not in a position to support it.