Backup cannot be taken when the last backup-ed Pod is not Ready
arosh opened this issue · comments
Describe the bug
When a job to take a backup of MySQL is executed, the backup job will fail if the Pod which the previous backup was taken from is not Ready. mysql-backup
output the following error message.
Error: failed to choose source instance: failed to show master status: dial tcp: lookup moco-CLUSTERNAME-0.moco-CLUSTERNAME.MYNAMESPACE.svc on 10.xxx.yyy.zzz:53: no such host
Environments
- Version: MOCO v0.16.1
- OS: Flatcar Container Linux (stable)
To Reproduce
- Deploy MySQLCluster
- Take backup
- Stop the Pod that was backed up at 2. (.Status.Backup.SourceIndex)
- Take backup again
Expected behavior
Backups are taken from another Ready replica.
Additional context
The following statement in ChoosePod seems to be intended as a fallback if the last backup-ed Pod is not Ready.
https://github.com/cybozu-go/moco/blob/v0.16.1/backup/backup.go#L249
However, just before that, it tries to get the status of the last backup-ed Pod. Hence, if the Pod is not Ready, the backup will fail.
https://github.com/cybozu-go/moco/blob/v0.16.1/backup/backup.go#L216-L230