histogram_quantile does not work correctly for higher quantiles (missing sorting of the buckets)?
jodzga opened this issue · comments
Function histogram_quantile
does not work correctly for higher quantiles.
The image below shows the edge case, for quantile 1.0 but the same is true for high quantiles, close to 1.0 e.g. 0.9999.
Above, you can see that buckets "0.256" and above have counts 4284. The 1-quantile should be 0.256 but on the top graph you can see that it is calculated to be 8.192.
The source code for the function that computes the quantile in Prometheus:
https://github.com/prometheus/prometheus/blob/main/promql/quantile.go#L73
The same for m3:
The sort.Search is supposed to return the smallest bucket but it seems that the m3 version does not sort the buckets first. Perhaps m3 version assumes that buckets are sorted somewhere outside the function?
Is this the issue that 0d01d62 was intended to fix?
Thanks @markj-db and @gibbscullen for following up.
Mark, you are right lack of sorting is not the problem. The m3 implementation does sort the buckets by upperBound
in sanitizeBuckets
function:
I figured out what the problem is.
I was using the following query: histogram_quantile(1.0, sum(rate(coredns_dns_request_duration_seconds_bucket{}[5m])) by (le))
.
The problem is that the rate
function, while doing its magic, turns counts into fractions. Most fractions can't be expressed exactly as a floating point number (IEEE754 standard). The resulting number that represents the fraction is just an approximation that uses up all bits of mantissa. Addition of such numbers is neither commutative nor associative operation, see example here: https://play.golang.org/p/5jGVFNhGZo8.
As a consequence, the sum
of counts differ very slightly depending on in what order numbers were summed up. Here is an example graph that shows this problem:
Buckets 0.256 and above seem to have the same count.
Here is the same data point but showing numbers with full precision:
A a quick and dirty workaround is to get rid of the "noise" by rounding off some of the mantissa to make sum commutative and associative again. Unfortunately AFAIK Prometheus does not have a function for rounding number to certain precision so, as a workaround, I can simply multiply numbers by very large number, round to integer and then divide by the same very large number.
For example, instead of: histogram_quantile(1.0, sum(rate(coredns_dns_request_duration_seconds_bucket{}[5m])) by (le))
I can use: histogram_quantile(1.0, sum(round(1000000000*rate(coredns_dns_request_duration_seconds_bucket{}[5m]))) by (le) / 1000000000)
.
I can think of a couple of ways this issue could be "mitigated" in Prometheus/m3. These are just brainstorming ideas, not fully thought through:
- Do not depend on associativity and commutativity of
sum
by making order of operations deterministic e.g. sort all numbers before summing them up. - Get back associativity and commutativity by making
rate
and other functions return numbers that do not use full mantissa (decrease precision).
@gibbscullen I believe this is a pretty serious problem. AFAICT histogram_quantile queries for high percentiles will return wrong result. I'd love to help out but I am not a golang developer and onboarding (getting to the point where I can go through dev cycle: build, test, modify) could take me a significant amount of time.
Would someone from the m3 maintainers team be interested in collaborating on fixing this issue?
@jodzga the example you gave of sum(rate(foo[5m])) - sum(rate(foo[5m]))
is interesting because it implies the order of values being summed is non-deterministic. Perhaps it would be useful to look at how sum()
is implemented. My naive read of the M3 code suggests that query execution is parallelized:
https://github.com/m3db/m3/blob/master/src/query/executor/state.go#L192
This could well be a source of non-determinism if not handled carefully -- I'm not sure how the query is distributed or how any reduction operations are done.
@jodzga @markj-db sounds like you may already be working on this together, but one idea would be to change the default here
m3/src/cmd/services/m3query/config/config.go
Lines 279 to 280 in 2e5700a
prometheus
)