apache / datasketches-java

A software library of stochastic streaming algorithms, a.k.a. sketches.

Home Page:https://datasketches.apache.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Avoid unnecessary allocations in HllSketch

hpx7 opened this issue · comments

MurmurHash3 does a bunch of unnecessary allocations, which means that every HllSketch update there are at least two object allocations. All these objects are very short lived and need to be garbage collected which is undesirable.

Given the other sketches code doesn't do this and we don't guarantee thread safety, this seems to be an oversight. Let me know if you'd like contributions for this fix.

https://github.com/DataSketches/sketches-core/blob/master/src/main/java/com/yahoo/sketches/hash/MurmurHash3.java#L59
https://github.com/DataSketches/sketches-core/blob/master/src/main/java/com/yahoo/sketches/hash/MurmurHash3.java#L253

Contributions are always welcome. However, this hash function is used everywhere and in many applications outside of this library. Any change in this code CANNOT change the hash transform in any way. We have massive history of sketch images with hash values that would become useless if this hash function changes. So I would appreciate it if along with your contribution, you demonstrate proof with documented reasoning and testing.

I have clocked this hash function at about 5 to 6 nanoseconds per update in long run testing, so I suspect that the JIT compiler may be inlining the entire HashState class as well as the long array you point out. Some detailed examination of the assembly code would be really helpful to sort out what is really going on. I doubt that the JVM could actually be allocating these objects in Eden and achieve those speeds.

Nonetheless, even if this is the case, giving JIT less work to do is a good thing, and a such a change as you propose might reduce the JIT warm-up cycle.

Thanks!

If you are serious about this, I will create a branch for you to submit a PR to. Just let me know. Please don't submit a PR against master. Thanks.

Yes I plan to contribute this. Let me know which branch.

UpdateMurmurHash3

@hpx7

I ran some exhaustive speed characterization tests against the master and the UpdateMurmurHash3 branch:

Plot 1
murmurhash3speed

Hash V1 is the current master. Hash V2 is your UpdateMurmurHash3 branch.

The X-axis is Iterations per Trial: The number of calls to the hash function in a very tight loop, each call with a different input value. The early part of the curve is higher due to Java /JIT warmup. As you can see it flattens out after about 1024 iterations per trial. There are 4 measurement points per octave along the X-axis. The number of trials starts at 2^23 per point at the start, and gradually decreases as X increases down to about 2^17 at the high end.

The Y axis is the average nS per hash call, which levels out at about 7.8 nS per call.

Plot 2
murmurhash3speed_null_run

Plot 2 is the "Null" plot that runs the exact same test harness, except there is no call to the hash function, the loop just returns a different number each time. This attempts to measure the overhead of the calling loop, which appears to be about 0.7 nS. Subtracting this from the results of Plot 1, says that the raw speed of the hash function is about 7.1 nS.


The other test that should be run would be profiling the GC to visualize what may be going on in the Eden space. I don't have time to do this, so if you could set this up it would be great. You can use either the Java VisualVM or YourKit.

You will find my test harness java code at: Code and the config file at: Config

Once you have setup YourKit or VirtualVM you should be able to run the same test harness, or slightly modified to examine GC activity.


In conclusion, so far, I can see no substantive difference in the performance. I suspect that the JIT compiler is very effective at eliminating both the class object creation and the creation of the small size 2 long array and effectively inlining both.

Without evidence that your proposed changes actually make a difference, I am not inclined to make any changes to the library. But I will keep this open for a while to see what you come up with.