arnavtripathy/etcd-backup-notifier

ETCD Backup Automation

ETCD data is a crucial part of a kubernetes cluster. As such a regular backup system should be created to make sure that all the data is stored someplace to be used in case of a total cluster failure . This implementation should also address the following issues:-

Since etcd runs on localhost of the master node , hence our implementation might require scheduling workloads on the master node. So care must be taken that it should not take up too much CPU or memory power .
It should have a good alerting system. The stakeholders and the administrator should get regular alerts about the backing up.
The data should be kept in a proper and reliable location so that it can be easily restored in case of emergency.
Automation is key. The whole process should not require any stakeholder interaction unless there's an absolute necessity

While preparing a solution, we have addressed the issues and come up with a strategy which we have represented in the below diagram:

The general flow of the backup process goes something like this:-

A system cronjob at a particular time in the day schedules a kubernetes job which will create a snapshot of the etcd data and store it in the node host path. This is a containerised process which does not take up a lot of computation power of the node.
The data will be automatically pushed to a remote s3 bucket and the job will be deleted. This solves the reliability of the data storage issue.
An email and slack alert is sent whenever a backup action happens both for success and failure. This is to doubly ensure that things are going fine.
All of this is completely automated using python and bash.

arnavtripathy / etcd-backup-notifier

About

Languages