Add ability for prometheus & thanos sidecar to flush on graceful shutdown

Question

Add ability for prometheus & thanos sidecar to flush on graceful shutdown

Nashluffy opened this issue 2 months ago · comments

Component(s)

Prometheus

What is missing? Please describe.

We run several short-lived (sometimes only 1 hour in age) clusters. When using the thanos sidecar approach, downscaling a prometheus replica (either permanently or removing shards) will result in data loss of all chunks in the head.

There are several issues that have all roughly touched on this issue.
#4967
prometheus/prometheus#12261
thanos-io/thanos#1849

It would be great to have native support for flushing and uploading what’s in the head in prometheus-operator (likely requiring changes to other components as well).

Unfortunately there's no TSDB API for "flushing" the head, but you can create a snapshot of TSDB, then move all new blocks in that snapshot into the top-level data dir.

The thanos sidecar can then perform it's own "flushing" in the form of uploading blocks one last time.

prometheus-operator feels like the most natural place to orchestrate this, but open to discussion!

kind: Prometheus
spec:
  thanos:
    flushOnShutdown: true

Describe alternatives you've considered.

I'm currently achieving this in a separate container that uses a preStop hook to

call the snapshot endpoint of prometheus
move the new blocks from that snapshot dir into the top-level data dir
run thanos tools bucket upload-blocks.

The snapshot isn't a lot of storage as existing blocks in the snapshot are symlinks to the actual block.

We previously used a Thanos receiver setup which avoided this problem altogether, but it was wildly more expensive and quite a lot of overhead to operate.

Environment Information.

Environment

Kubernetes Version: 1.27
Prometheus-Operator Version: 0.73

Nicolas Takashi · Answer 1 · Sun Apr 21 2024 03:54:46 GMT+0800 (China Standard Time)

Hey @Nashluffy thanks for this new issue. 😄
Yes, the described steps would work and sounds pretty nice.
I'd like to have something less hacky by not relying on lifecycle hooks.
I just opened this new issue on the Thanos Project, let's see what people think about it.
thanos-io/thanos#7295

Nash Luffman · Answer 2 · Mon Apr 22 2024 15:41:38 GMT+0800 (China Standard Time)

Thanks! I'll keep the prometheus-operator discussion here

Just another point: I think a call to the flush endpoint should be part of the Prometheus finalizer as well, not just when scaling down shards. This would capture my use-case, as we don't use shards.

Arthur Silva Sens · Answer 3 · Fri Apr 26 2024 02:59:04 GMT+0800 (China Standard Time)

Seems aligned with on of the ideas we had for Graceful shutdown. (See https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/proposals/202310-shard-autoscaling.md#snapshot--upload-on-shutdown)

I think we could extend the proposed API to also provide this alternative as a shutdown option. Of course, that requires me to continue and finish my PR 😅