cortexproject / cortex

A horizontally scalable, highly available, multi-tenant, long term Prometheus.

Home Page:https://cortexmetrics.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"maxFailure (quorum) on a given error family" error: consider different wording

kevinburkesegment opened this issue · comments

Recently, we received an error message that included this text, inside of pkg/ring/batch.go. Based on this wording and other contextual info, we assumed that the error was internal to our Prometheus instance, possibly related to having too few nodes to reach quorum. We then began to investigate cluster health, which took a long time and turned out to be a red herring, since we don't manage these Prometheus nodes ourselves, this took lots of effort.

It turns out instead we had a client sending out of error metrics, and all this error message was trying to say was that several different Prometheus nodes agreed that this was a problem.

How about rewording this error message to say e.g. "data write failed by consensus" ?