Update to 0.4.0-beta.2: no snapshots available on restore
ohthehugemanatee opened this issue · comments
I had an amd64 node die in my cluster and had to migrate some litestream-dependent workloads to arm64. Looks like there's no arm64 build of 0.3.8
available, so I had to run 0.4.0-beta.2
instead.
But on first run on the new node, the init container fails to restore. Apparently it can't see any replicas.
#> litestream restore /db/sonarr.db
cannot determine latest generation: generation time bounds: no snapshots available
#> litestream snapshots /db/sonarr.db
replica generation index size created
The backup directory has plenty of data:
#> tree /db.backup/sonarr.db
/db.backup/sonarr.db
└── generations
└── e7d6ea14e0477efe
├── snapshots
│ └── 0001bd38.snapshot.lz4
└── wal
├── 0001bd38_00000000.wal.lz4
├── 0001bd38_00000438.wal.lz4
├── 0001bd38_000049d0.wal.lz4
├── 0001bd38_00005618.wal.lz4
...
├── 0001bd98_00009380.wal.lz4
└── 0001bd98_0000a7f8.wal.lz4
4 directories, 396 files
What am I missing?
I just tried with a 0.3.8 container running on amd64, and it worked fine. So, validated this is a regression or some missing upgrade documentation. Here's my complete setup:
Sidecar container to replicate:
- name: sonarr-litestream
image: litestream/litestream:0.3.8
args: ['replicate']
volumeMounts:
- name: sonarr-db
mountPath: /db
- name: nfs
subPath: ".docker/config/sonarr/db-backup.litestream"
mountPath: /db.backup
- name: litestream-config
mountPath: /etc/litestream.yml
subPath: litestream.yml
litestream.yml:
dbs:
- path: /db/sonarr.db
replicas:
- path: /db.backup/sonarr.db
Now trying to restore from a one-off container.
❯ k run -it --tty litestream --image=litestream/litestream:0.4.0-beta.2 --overrides='
{
"spec": { "nodeSelector": { "kubernetes.io/arch": "amd64" },
"volumes": [
{
"name": "nfs",
"persistentVolumeClaim": {
"claimName": "nfs-claim"
}
},
{
"name": "litestream-config",
"configMap": {
"name": "sonarr-litestream"
}
}
],
"containers": [
{
"name": "init-litestream",
"image": "litestream/litestream:0.4.0-beta.2",
"command": ["sleep"], "args": ["1200"], "volumeMounts": [
{
"name": "nfs",
"subPath": ".docker/config/sonarr/db-backup.litestream",
"mountPath": "/db.backup"
},
{
"name": "litestream-config",
"mountPath": "/etc/litestream.yml",
"subPath": "litestream.yml"
}
],
"stdIn": true,
"stdinOnce": true,
"tty": true
}
]
}
}' bash
It only succeeds if that restore container is running the same version as created the backups in the first place.
But then... how am I supposed to change my setup to 0.4.0? Is there an upgrade script necessary?
@ohthehugemanatee The replica directory structure changed in v0.4.0. You won't be able to restore a backup from v0.3.x but when you start up v0.4.0, it should see that there are no snapshots available and issue a new one and begin replicating from there.
Looks like there's no arm64 build of 0.3.8 available, so I had to run 0.4.0-beta.2 instead.
What OS are you running? There's some arm64 builds for Linux for v0.3.8: https://github.com/benbjohnson/litestream/releases/tag/v0.3.8
Thanks for answering! I'm on kubernetes (so, Linux containers), and I don't see any container builds for 0.3.8 on arm64. Any chance you could run the multi-arch docker pipeline on 0.3.8 once so we get one?
when you start up v0.4.0, it should see that there are no snapshots available and issue a new one and begin replicating from there.
The problem is, if you use litestream as "a simple persistence layer for Kubernetes services", you are effectively stuck on 0.3.x forever. By definition every container starts without state; the database is created by litestream restore
. If you update to 0.4 that step will fail, and you have an empty db... nothing to replicate from.
I guess you have to exec into a running container with the DB, install a copy of litestream 0.4, and run the first replication from there. Does it break the existing 0.3.8 backup store if 0.4.0 writes over the same location?
I am also seeing this when attempting to restore from S3.
If I run this command on v0.3.8 on my local machine, it works:
litestream restore --config litestream/primary.yml -replica s3 -o mydb.db /data/mydb
But v0.4.0-beta2 complains:
cannot determine latest generation: generation time bounds: no snapshots available
This is indeed a bit more sinister than initially thought. I didn't quite understand it first that "it should see that there are no snapshots available and issue a new one and begin replicating from there" meant the replication target is considered empty and assumes the latest local file exists.
We're also going to use Litestream on Kubernetes with ephemeral mounts and recently I switched over from 0.3.9 to master to start tracking 0.4.0 before it's released. When switching over to the new version we also hit the "no snapshots available" issue but since it's not yet in production it wasn't as critical as it would have been for someone who had.
Even though we can sidestep this issue by skipping 0.3.x entirely it would likely be a good idea to add support during restore for looking in the old 0.3.x structure if 0.4.x ones turn empty. The very least a big warning in 0.4.0 release notes that restoring from a 0.3.x snapshot is not supported and it's a breaking change so at least people have a heads up.
Thanks!
As the old main branch was scrapped I'll close this as it's not relevant anymore.