m3db / m3

M3 monorepo - Distributed TSDB, Aggregator and Query Engine, Prometheus Sidecar, Graphite Compatible, Metrics Platform

Home Page:https://m3db.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[dbNode]: Add metric reflecting commit-log size on disk

asafm opened this issue · comments

Add a metric which reflects the size of commit-log on disk which hasn't been read yet.

Motivation
When a node crashes and restarts it starts accepting writes and pushes them into the commit-log. If the nodes keeps crashing on OOM (for example, some shards are bootstrapped from peers), commit-log will eventually reach a size on disk, which exceeds the available memory for the node, thus on next restart, it will never succeed passing the commit-log bootstrap. Having a metric exposing that size and alert on it, can save us and have us trigger a config change for that node, that will prefer peer bootstrap in favor of commit-log in this case.

@asafm -- thanks for submitting! We would be happy to review a contribution for this new metric request.