SGW Redis client side cache

Question

SGW Redis client side cache

philiptrovato opened this issue 5 months ago · comments

Trying to implement redis “client side” caching for index and chunks cache.

Have a setup like below (Cortex 1.16.0) and it runs, and SGWs talk to redis, but I don’t see any evidence that the “client side” caching is working. (AWS redis 7.0.7)

Don’t know, or I am not aware of any specific “Client-side” metrics or logs to look at to validate. But I validated like below.

Looked at Network I/O bytes in/out from the SGWs and from the Redis nodes. And see no difference in I/O with the “redis.cache-size” flags set or removed. Would expect to see a drop in network I/O after the “redis.cache-size” flags are set.

Additionally see no difference in memory utilization from the SGWs with the “redis.cache-size” flags set or removed. Would expect to see an memory increase once the flags are set, because of the client side cache.

Also a little unclear with what the flag "max_chunk_pool_bytes" does in relation to what I am trying to achieve (haven’t played with it yet – running default of 2GB). Is this an inmemory chunks cache, or just working space that gets quickly cleared? If it’s a cache, how is a “Client-side” chunks redis cache different?

Flags on SGW’s
-log.level=info
-frontend.log-queries-longer-than=5s
-blocks-storage.bucket-store.ignore-deletion-marks-delay=1h
-store-gateway.sharding-ring.wait-stability-min-duration=0s
-store-gateway.sharding-ring.wait-stability-max-duration=0s
-blocks-storage.bucket-store.chunks-cache.backend= redis
-blocks-storage.bucket-store.chunks-cache.redis.addresses= XXXXXXXX:6379
-blocks-storage.bucket-store.chunks-cache.redis.tls-enabled=true
-blocks-storage.bucket-store.index-cache.backend=redis
-blocks-storage.bucket-store.index-cache.redis.addresses= XXXXXXXX:6379
-blocks-storage.bucket-store.index-cache.redis.tls-enabled=true
-blocks-storage.bucket-store.index-cache.redis.cache-size=1500000000
-blocks-storage.bucket-store.chunks-cache.redis.cache-size=5000000000

Config
bucket_store:
sync_dir: /opt/monitoring/cortex/tsdb-sync
sync_interval: 5m0s
max_concurrent: 100
max_inflight_requests: 0
tenant_sync_concurrency: 10
block_sync_concurrency: 20
meta_sync_concurrency: 20
consistency_delay: 0s
index_cache:
backend: redis
inmemory:
max_size_bytes: 1073741824
enabled_items: []
memcached:
addresses: ""
timeout: 100ms
max_idle_connections: 16
max_async_concurrency: 50
max_async_buffer_size: 10000
max_get_multi_concurrency: 100
max_get_multi_batch_size: 0
max_item_size: 1048576
auto_discovery: false
enabled_items: []
redis:
addresses: XXXXXXXX:6379
username: ""
password: XXXXXXXX
db: 0
master_name: ""
max_get_multi_concurrency: 100
get_multi_batch_size: 100
max_set_multi_concurrency: 100
set_multi_batch_size: 100
max_async_concurrency: 50
max_async_buffer_size: 10000
dial_timeout: 5s
read_timeout: 3s
write_timeout: 3s
tls_enabled: true
tls_cert_path: ""
tls_key_path: ""
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
cache_size: 1500000000
enabled_items: []
chunks_cache:
backend: redis
memcached:
addresses: ""
timeout: 100ms
max_idle_connections: 16
max_async_concurrency: 50
max_async_buffer_size: 10000
max_get_multi_concurrency: 100
max_get_multi_batch_size: 0
max_item_size: 1048576
auto_discovery: false
redis:
addresses: XXXXXXXX:6379
username: ""
password: XXXXXXXX
db: 0
master_name: ""
max_get_multi_concurrency: 100
get_multi_batch_size: 100
max_set_multi_concurrency: 100
set_multi_batch_size: 100
max_async_concurrency: 50
max_async_buffer_size: 10000
dial_timeout: 5s
read_timeout: 3s
write_timeout: 3s
tls_enabled: true
tls_cert_path: ""
tls_key_path: ""
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
cache_size: 5000000000
subrange_size: 16000
max_get_range_requests: 3
attributes_ttl: 168h0m0s
subrange_ttl: 24h0m0s
metadata_cache:
backend: ""
memcached:
addresses: ""
timeout: 100ms
max_idle_connections: 16
max_async_concurrency: 50
max_async_buffer_size: 10000
max_get_multi_concurrency: 100
max_get_multi_batch_size: 0
max_item_size: 1048576
auto_discovery: false
redis:
addresses: ""
username: ""
password: ""
db: 0
master_name: ""
max_get_multi_concurrency: 100
get_multi_batch_size: 100
max_set_multi_concurrency: 100
set_multi_batch_size: 100
max_async_concurrency: 50
max_async_buffer_size: 10000
dial_timeout: 5s
read_timeout: 3s
write_timeout: 3s
tls_enabled: false
tls_cert_path: ""
tls_key_path: ""
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
cache_size: 0
tenants_list_ttl: 15m0s
tenant_blocks_list_ttl: 5m0s
chunks_list_ttl: 24h0m0s
metafile_exists_ttl: 2h0m0s
metafile_doesnt_exist_ttl: 5m0s
metafile_content_ttl: 24h0m0s
metafile_max_size_bytes: 1048576
metafile_attributes_ttl: 168h0m0s
block_index_attributes_ttl: 168h0m0s
bucket_index_content_ttl: 5m0s
bucket_index_max_size_bytes: 1048576
ignore_deletion_mark_delay: 1h0m0s
ignore_blocks_within: 0s
bucket_index:
enabled: true
update_on_error_interval: 1m0s
idle_timeout: 1h0m0s
max_stale_period: 1h0m0s
max_chunk_pool_bytes: 2147483648
chunk_pool_min_bucket_size_bytes: 16000
chunk_pool_max_bucket_size_bytes: 50000000
index_header_lazy_loading_enabled: false
index_header_lazy_loading_idle_timeout: 20m0s
lazy_expanded_postings_enabled: false
partitioner_max_gap_bytes: 524288
estimated_max_series_size_bytes: 65536
estimated_max_chunk_size_bytes: 16000
postings_offsets_in_mem_sampling: 32
series_batch_size: 10000
tsdb:
dir: /opt/monitoring/cortex/tsdb
block_ranges_period:
- 2h0m0s
retention_period: 6h0m0s
ship_interval: 1m0s
ship_concurrency: 10
head_compaction_interval: 1m0s
head_compaction_concurrency: 5
head_compaction_idle_timeout: 1h0m0s
head_chunks_write_buffer_size_bytes: 4194304
stripe_size: 16384
wal_compression_enabled: false
wal_segment_size_bytes: 134217728
flush_blocks_on_shutdown: false
close_idle_tsdb_timeout: 0s
head_chunks_write_queue_size: 0
max_tsdb_opening_concurrency_on_startup: 10
max_exemplars: 0
memory_snapshot_on_shutdown: false
out_of_order_cap_max: 32
.......
.......
store_gateway:
sharding_enabled: true
sharding_ring:
kvstore:
store: consul
prefix: collectors/
dynamodb:
region: ""
table_name: ""
ttl: 0s
puller_sync_time: 1m0s
max_cas_retries: 10
consul:
host: localhost:8500
acl_token: ""
http_client_timeout: 20s
consistent_reads: false
watch_rate_limit: 1
watch_burst_size: 1
etcd:
endpoints: []
dial_timeout: 10s
max_retries: 10
tls_enabled: false
tls_cert_path: ""
tls_key_path: ""
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
username: ""
password: ""
multi:
primary: ""
secondary: ""
mirror_enabled: false
mirror_timeout: 2s
heartbeat_period: 15s
heartbeat_timeout: 1m0s
replication_factor: 3
tokens_file_path: ""
zone_awareness_enabled: false
keep_instance_in_the_ring_on_shutdown: false
zone_stable_shuffle_sharding: false
wait_stability_min_duration: 0s
wait_stability_max_duration: 0s
wait_instance_state_timeout: 10m0s
final_sleep: 0s
instance_id: ip-10-250-39-113.ec2.internal
instance_interface_names:
- eth1
- eth0
instance_port: 0
instance_addr: ""
instance_availability_zone: ""
sharding_strategy: default

Ben Ye · Answer 1 · Fri Jan 19 2024 12:14:36 GMT+0800 (China Standard Time)

Hi @philiptrovato, this seems indeed a bug. I will submit a fix.