cortexproject / cortex

A horizontally scalable, highly available, multi-tenant, long term Prometheus.

Home Page:https://cortexmetrics.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

multilevel cache with inmemory and redis

philiptrovato opened this issue · comments

Describe the bug
Implementing multilevel cache (immemory on top of working redis) and see the panic below. Similar error to #5475

panic: a previously registered descriptor with the same fully-qualified name as Desc{fqName: "thanos_redis_getmulti_gate_gets_max", help: "Maximum number of concurrent gets.", constLabels: {component="store-gateway",level="L1",name="index-cache"}, variableLabels: {}} has different label names or a different help string

To Reproduce
Steps to reproduce the behavior:

  1. Cortex 1.16.0
  2. Added multilevel cache (inmemory on top of working redis) -blocks-storage.bucket-store.index-cache.backend=inmemory,redis

Expected behavior
Cortex to startup without panic

Additional Context

Similar error to #5475

That's pretty weird. The bug should be already fixed. I tried to reproduce this issue locally with Cortex 1.16 binary and cannot reproduce it. Can you please share your full store gateway config? Trying to see if it is affected by something else

https://github.com/cortexproject/cortex/blob/master/integration/querier_test.go#L54
We also have an integration test case to ensure multilevel index cache for inmemory and redis should work.

Sure slightly redacted below.

Pretty much the base/default config, with some CLI overrides on the SGW. FYI, stumped across this. I don't really need multilevel cache, I think redis client-side cache would be the similar (or better since chunks could also be cached inmemory). But having some issues with that (dont really see any evidence that the client side cache is created or works), Will create a separate issue on that to not sprawl.

-log.level=info
-frontend.log-queries-longer-than=5s
-blocks-storage.bucket-store.ignore-deletion-marks-delay=1h
-store-gateway.sharding-ring.wait-stability-min-duration=0s
-store-gateway.sharding-ring.wait-stability-max-duration=0s
-blocks-storage.bucket-store.chunks-cache.backend=inmemory,redis
-blocks-storage.bucket-store.chunks-cache.redis.addresses=XXXXXXXX:6379
-blocks-storage.bucket-store.chunks-cache.redis.tls-enabled=true
-blocks-storage.bucket-store.index-cache.backend=redis
-blocks-storage.bucket-store.index-cache.redis.addresses=XXXXXXXX:6379
-blocks-storage.bucket-store.index-cache.redis.tls-enabled=true
-blocks-storage.bucket-store.index-cache.redis.cache-size=1073741824
-blocks-storage.bucket-store.chunks-cache.redis.cache-size=1073741824

bucket_store:
sync_dir: /opt/monitoring/cortex/tsdb-sync
sync_interval: 5m0s
max_concurrent: 100
max_inflight_requests: 0
tenant_sync_concurrency: 10
block_sync_concurrency: 20
meta_sync_concurrency: 20
consistency_delay: 0s
index_cache:
backend: inmemory,redis
inmemory:
max_size_bytes: 1073741824
enabled_items: []
memcached:
addresses: ""
timeout: 100ms
max_idle_connections: 16
max_async_concurrency: 50
max_async_buffer_size: 10000
max_get_multi_concurrency: 100
max_get_multi_batch_size: 0
max_item_size: 1048576
auto_discovery: false
enabled_items: []
redis:
addresses: XXXXXXXX:6379
username: ""
password: XXXXXXXX
db: 0
master_name: ""
max_get_multi_concurrency: 100
get_multi_batch_size: 100
max_set_multi_concurrency: 100
set_multi_batch_size: 100
max_async_concurrency: 50
max_async_buffer_size: 10000
dial_timeout: 5s
read_timeout: 3s
write_timeout: 3s
tls_enabled: true
tls_cert_path: ""
tls_key_path: ""
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
cache_size: 0
enabled_items: []
chunks_cache:
backend: redis
memcached:
addresses: ""
timeout: 100ms
max_idle_connections: 16
max_async_concurrency: 50
max_async_buffer_size: 10000
max_get_multi_concurrency: 100
max_get_multi_batch_size: 0
max_item_size: 1048576
auto_discovery: false
redis:
addresses: XXXXXXXX:6379
username: ""
password: XXXXXXXX
db: 0
master_name: ""
max_get_multi_concurrency: 100
get_multi_batch_size: 100
max_set_multi_concurrency: 100
set_multi_batch_size: 100
max_async_concurrency: 50
max_async_buffer_size: 10000
dial_timeout: 5s
read_timeout: 3s
write_timeout: 3s
tls_enabled: true
tls_cert_path: ""
tls_key_path: ""
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
cache_size: 0
subrange_size: 16000
max_get_range_requests: 3
attributes_ttl: 168h0m0s
subrange_ttl: 24h0m0s
metadata_cache:
backend: ""
memcached:
addresses: ""
timeout: 100ms
max_idle_connections: 16
max_async_concurrency: 50
max_async_buffer_size: 10000
max_get_multi_concurrency: 100
max_get_multi_batch_size: 0
max_item_size: 1048576
auto_discovery: false
redis:
addresses: ""
username: ""
password: ""
db: 0
master_name: ""
max_get_multi_concurrency: 100
get_multi_batch_size: 100
max_set_multi_concurrency: 100
set_multi_batch_size: 100
max_async_concurrency: 50
max_async_buffer_size: 10000
dial_timeout: 5s
read_timeout: 3s
write_timeout: 3s
tls_enabled: false
tls_cert_path: ""
tls_key_path: ""
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
cache_size: 0
tenants_list_ttl: 15m0s
tenant_blocks_list_ttl: 5m0s
chunks_list_ttl: 24h0m0s
metafile_exists_ttl: 2h0m0s
metafile_doesnt_exist_ttl: 5m0s
metafile_content_ttl: 24h0m0s
metafile_max_size_bytes: 1048576
metafile_attributes_ttl: 168h0m0s
block_index_attributes_ttl: 168h0m0s
bucket_index_content_ttl: 5m0s
bucket_index_max_size_bytes: 1048576
ignore_deletion_mark_delay: 1h0m0s
ignore_blocks_within: 0s
bucket_index:
enabled: true
update_on_error_interval: 1m0s
idle_timeout: 1h0m0s
max_stale_period: 1h0m0s
max_chunk_pool_bytes: 2147483648
chunk_pool_min_bucket_size_bytes: 16000
chunk_pool_max_bucket_size_bytes: 50000000
index_header_lazy_loading_enabled: false
index_header_lazy_loading_idle_timeout: 20m0s
lazy_expanded_postings_enabled: false
partitioner_max_gap_bytes: 524288
estimated_max_series_size_bytes: 65536
estimated_max_chunk_size_bytes: 16000
postings_offsets_in_mem_sampling: 32
series_batch_size: 10000
.......
.......
store_gateway:
sharding_enabled: true
sharding_ring:
kvstore:
store: consul
prefix: collectors/
dynamodb:
region: ""
table_name: ""
ttl: 0s
puller_sync_time: 1m0s
max_cas_retries: 10
consul:
host: localhost:8500
acl_token: ""
http_client_timeout: 20s
consistent_reads: false
watch_rate_limit: 1
watch_burst_size: 1
etcd:
endpoints: []
dial_timeout: 10s
max_retries: 10
tls_enabled: false
tls_cert_path: ""
tls_key_path: ""
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
username: ""
password: ""
multi:
primary: ""
secondary: ""
mirror_enabled: false
mirror_timeout: 2s
heartbeat_period: 15s
heartbeat_timeout: 1m0s
replication_factor: 3
tokens_file_path: ""
zone_awareness_enabled: false
keep_instance_in_the_ring_on_shutdown: false
zone_stable_shuffle_sharding: false
wait_stability_min_duration: 0s
wait_stability_max_duration: 0s
wait_instance_state_timeout: 10m0s
final_sleep: 0s
instance_id: XXXXXXXX
instance_interface_names:
- eth1
- eth0
instance_port: 0
instance_addr: ""
instance_availability_zone: ""
sharding_strategy: default

@philiptrovato Thanks for sharing the config.

-blocks-storage.bucket-store.chunks-cache.backend=inmemory,redis
-blocks-storage.bucket-store.chunks-cache.redis.addresses=XXXXXXXX:6379
-blocks-storage.bucket-store.chunks-cache.redis.tls-enabled=true
-blocks-storage.bucket-store.index-cache.backend=redis

Is this the problematic config or your current working config?
I saw you enabled multi level cache for chunks cache, which is not supported.
We only have multi level cache for index cache

Very sorry, added the "inmemory" stuff in on the fly... Its like below.

-blocks-storage.bucket-store.chunks-cache.backend=redis
-blocks-storage.bucket-store.index-cache.backend=inmemory,redis

Additionally "-blocks-storage.bucket-store.index-cache.backend" works fine when I set to just "inmemory" or just "redis", just fails when both are set.

@philiptrovato Thanks.
I will test it and see if I can reproduce the issue.

Think your onto something with it something todo with redis set for both index and chunks. Took the "chunks-cache" config out, and the SGW starts without error.