gardener / etcd-backup-restore

Collection of components to backup and restore the ETCD of a Kubernetes cluster.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Periodically run defragmentation on the embedded etcd during restoration

shreyas-s-rao opened this issue · comments

Enhancement (What you would like to be added):
Periodically run (compaction and) defragmentation on the embedded etcd during restoration, either after every X deltas applied or after every Y GB of etcd DB size reached

Motivation (Why is this needed?):
Restoration of large backups whose events add up to more than the etcd quota (set to default of 8GB) will fail. This especially hurts during snapshot compactions, where restoration is run on backups containing large number of events (defaulting to 1 million events), and restoration is quite likely to fail. It now becomes necessary to defragment the etcd DB during restoration, ie, while applying delta snapshots.

Approach/Hint to the implement solution (optional):
The criteria to run these defragmentations can either:

  1. be based on DB size (which is a large topic by itself, covered in #556), or
  2. restorer can simply run (compaction and) defragmentation after applying every X number of delta snapshots
    i. Since the maximum size of a delta snapshot is determined by delta-snapshot-memory-limit, and the embedded etcd quota size is determined by embedded-etcd-quota-bytes, we can calculate the maximum number of delta snapshots that can be restored before a defragmentation is required, using the formula (embedded-etcd-quota-bytes / delta-snapshot-memory-limit) / 2 (additionally diving by 2 to defragment after approximately half the etcd DB size quota is reached), giving us the maximum allowed number of deltas for safely restoring the etcd.
    ii. restorer can keep track of the total size of delta snapshots it has applied, and defragment once half the etcd DB size quota is reached.

/assign

Currently compaction is already configured and runs on embedded etcd during restoration as we start embedded with auto-compaction mode set to periodic with I guess with 30mins(default) of retention period.

cfg.AutoCompactionMode = ro.Config.AutoCompactionMode

Yes, this applies only for the auto-compaction of the etcd. Since we would anyway run etcd compaction right before running defragmentation, maybe auto-compaction may not be needed anymore. Or even if it does, it wouldn't clash with the compaction that's run before defragmentation.

/unassign
/assign @ishan16696