Add ability for prometheus & thanos sidecar to flush on graceful shutdown
Nashluffy opened this issue · comments
Component(s)
Prometheus
What is missing? Please describe.
We run several short-lived (sometimes only 1 hour in age) clusters. When using the thanos sidecar approach, downscaling a prometheus replica (either permanently or removing shards) will result in data loss of all chunks in the head.
There are several issues that have all roughly touched on this issue.
#4967
prometheus/prometheus#12261
thanos-io/thanos#1849
It would be great to have native support for flushing and uploading what’s in the head in prometheus-operator (likely requiring changes to other components as well).
Unfortunately there's no TSDB API for "flushing" the head, but you can create a snapshot of TSDB, then move all new blocks in that snapshot into the top-level data dir.
The thanos sidecar can then perform it's own "flushing" in the form of uploading blocks one last time.
prometheus-operator feels like the most natural place to orchestrate this, but open to discussion!
kind: Prometheus
spec:
thanos:
flushOnShutdown: true
Describe alternatives you've considered.
I'm currently achieving this in a separate container that uses a preStop
hook to
- call the snapshot endpoint of prometheus
- move the new blocks from that snapshot dir into the top-level data dir
- run
thanos tools bucket upload-blocks
.
The snapshot isn't a lot of storage as existing blocks in the snapshot are symlinks to the actual block.
We previously used a Thanos receiver setup which avoided this problem altogether, but it was wildly more expensive and quite a lot of overhead to operate.
Environment Information.
Environment
Kubernetes Version: 1.27
Prometheus-Operator Version: 0.73
Hey @Nashluffy thanks for this new issue. 😄
Yes, the described steps would work and sounds pretty nice.
I'd like to have something less hacky by not relying on lifecycle hooks.
I just opened this new issue on the Thanos Project, let's see what people think about it.
thanos-io/thanos#7295
Thanks! I'll keep the prometheus-operator discussion here
Just another point: I think a call to the flush endpoint should be part of the Prometheus
finalizer as well, not just when scaling down shards. This would capture my use-case, as we don't use shards.
Seems aligned with on of the ideas we had for Graceful shutdown. (See https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/proposals/202310-shard-autoscaling.md#snapshot--upload-on-shutdown)
I think we could extend the proposed API to also provide this alternative as a shutdown option. Of course, that requires me to continue and finish my PR 😅