[dbNode]: Add metric reflecting commit-log size on disk
asafm opened this issue · comments
Add a metric which reflects the size of commit-log on disk which hasn't been read yet.
Motivation
When a node crashes and restarts it starts accepting writes and pushes them into the commit-log. If the nodes keeps crashing on OOM (for example, some shards are bootstrapped from peers), commit-log will eventually reach a size on disk, which exceeds the available memory for the node, thus on next restart, it will never succeed passing the commit-log bootstrap. Having a metric exposing that size and alert on it, can save us and have us trigger a config change for that node, that will prefer peer bootstrap in favor of commit-log in this case.
@asafm -- thanks for submitting! We would be happy to review a contribution for this new metric request.