Bizarre Debian Buster DEB pkg test failure
ximon18 opened this issue · comments
The pkg-test (debian:buster)
Packaging workflow job is consistently failing with a bizarre error while the Debian Stretch and Debian Bullseye jobs work just fine.
The error message is: (see here)
cannot locate base snap core20: No such file or directory
This appears to have something to do with the Snap daemon which we don't use or care about.
A quick test of the Debian Buster .deb
package (produced by the pkg (debian:buster)
job in the same workflow) in a Docker container seems to show the rtrtr
binary works fine, at least it runs and can report its own version number.
For now we'll just disable the pkg-test (debian:buster)
job, hopefully we can work out what is going on here later.
When you launch a machine/container. snapd
tries to refresh
the snaps if the image used was old. If certain commands/apps installed through snap
are executed during core
refresh, one might end up with an error message like cannot locate base snap core20: No such file or directory
.
Thankyou @yurtesen for your explanation, it is very appreciated.
I'm afraid I'm still at a loss to know what this is about. We do not do anything with snap nor I assume does snap care about Docker images. I'm not sure what a "core
refresh" is. Is this perhaps an issue with the underlying GitHub Actions hosted runner...?
Are you able to expand on your feedback to indicate how it relates to the workflow that is failing?
Thanks,
Ximon
@ximon18 I only found this issue because I had exactly same error in our Gitlab CI environment. I do not know for sure if it applies to your environment as I am not familiar with it. But snap
might care about docker images if the docker was installed as a snap
package in your test VM/container image. Snap core is what makes snap
tick ... -> https://snapcraft.io/install/core20/ubuntu <- So programs installed using snap
stop working momentarily when core
is being updated/refreshed.
We are using nested LXD containers. But I imagine the issue can happen if a VM or any machine is launched. In our case, as LXD is only available as a snap
container, the lxc
commands executed were susceptible. If we used docker installed as a snap
package, then docker commands would be effected when snap
is "refreshing" core
.
We solved the issue by adding to /etc/hosts/
the entry 127.0.0.1 api.snapcraft.io
in the test machine to stop snap
from refreshing packages. Because I found out that snap
refresh can't be stopped by design 🤦 If you can't do that, all you need might be having some delay before starting to use docker
because usually snap
refresh takes 10-20 seconds only.
Thanks @yurtesen.
That's a very enliightening comment, thankyou.
The workflow that failed also uses an LXD container, running on an Ubuntu host provided by GitHub Actions. I don't know if we can edit /etc/hosts
on that machine or even if we can if it would take effect early enough, or whether we would even want to as feels like a very hacky solution.
As you indicate that the problem occurs when snap refresh is running I wonder if there is some reliable way to detect if a snap refresh is in progress and to wait until it completes. Our workflow already waits for example until cloud-init
completes before proceeding.
Now that you've explained what is going on I at least have a better idea of what to try and investigate, thanks for that!
Ximon
@ximon18 I think one other way to resolve the issue would be executing snap refresh
on problem machine/script before executing any lxc
commands. It blocks until the refresh is completed. I am not able to see the output from the run you linked in your first message unfortunately.
GitHub has deleted the log I linked to unfortunately.
I tried today to reproduce the problem but cannot. If we see it again I'll come back to your useful advice here.
Thanks!
Ximon