cortexproject / cortex

A horizontally scalable, highly available, multi-tenant, long term Prometheus.

Home Page:https://cortexmetrics.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Distributor requires ingester configuration to access ring state

jakubgs opened this issue · comments

Describe the bug
While trying to move away from a multi kvstore configuration as suggested by @friedrichg I discovered bizzare behavior. I was using both ETCD and Consul as secondary, mistakenly thinking this could allow me to easily switch to Consul in case of an ETCD outage, but apparently that is not so.

When I removed the configuration for multi kvstore and reduced it to just use ETCD my distributors could not notice any of the Ingesters in the ring. First they appeared as Enhealthy, and when the forget button was pressed they simply disappeared. This confused me quite a bit, but on a hunch I added an ingester configuration section in the config for my distributor nodes:

ingester:
  lifecycler:
    ring:
      ...

And after that the distributor started recognizing the ingesters that were using ETCD as their primary kv store.

This shows that the distributor service actually requires ingester configuration to interact with ingesters.
Considering the documentation for ingester_config states:

image

Which is clearly wrong, because it ALSO configures things for the distributor. Only though pure instinct did I discover this.

To Reproduce
Steps to reproduce the behavior:

  1. Start Cortex 1.16.0
  2. Configure distributor node using only the distributor config section to use kv store other than default.
  3. Notice that it continues to read ingester ring state from the ingester configuration.

Expected behavior
Sane one.

And it appears this section is also required for querier service to discover ingesters.

What other services also require it? This should be documented and made clear.

This is the configuration used for all components that use the Ingester Ring, which includes Ingester itself, Distributor, Querier and Ruler.
Yeah I think maybe we can do a better job on clarifying things in our doc. @danielblando @alanprot WDYT?

Historically this was just a flag (and still is)

-ring.store

Which has nothing to do with ingester. But is at least more neutral.
This then became ingester_config in the configuration file.

There are more similar instances of this in the configuration file that are hard to understand by new users. Cortex strives for backward compatibility. And it the persue of backward compatibility we leave behind the configuration experience for new users. Sorry for that. Thanks for sharing your experience @jakubgs

I don't think we can fix that fast without creating cortex 2.0 and creating a lot of churn for old users. What is probably better for all is to use give new users more examples of working configurations so they can jump start fast and tweak stuff as needed. My suggestion to improve this in helm cortexproject/cortex-helm-chart#473

In your case, you are deploying this in virtual machines, so I don't think helm applies to you. Do you want to contribute some of the configuration work you have done ? I think some other users would benefit from it 😄

I understand the need for backwards compatibility, and that's fair, but the docs could be more clear about which config is used or required by which service.

Our infra repo for metrics infrastructure is private, but in theory I could extract just the Cortex role out and make it public.
Though it would be nice to also preserve the history of commits if I do that, and that would involve a bit more work. I'll think about how I could do that.

I have extracted our Cortex Ansible repository to a separate public repo: https://github.com/status-im/infra-role-cortex

I did it using this tool, though I had to do some history cleanup: https://github.com/newren/git-filter-repo

Not sure how useful this will be, but maybe it can help someone.