`Start` is very slow
maxzinkus opened this issue · comments
In testing #78, I also noticed that the Retry
loop surrounding Exec("hwclock -s")
fails a few times every boot on my machine. It looks like it's caused by startup not yet being ready -- that is, the SSH server hasn't completely come up.
Overall (especially with the current exponential backoff on Retry
-- changed in #73) this means that Booting...
takes around 2+4+8=
14 seconds.
If this loop is skipped, the hwclock -s
is not run -- meaning the system time will not be reset based on the RTC. However, given #78, it seems that the RTC is wrong regardless after sleep and therefore this step might not be necessary.
If the loop is skipped, Start
reaches qemu/ops.go:354
before qemu-system-...
has finished starting the VM. It then checks for the pidfile
which is not yet present and determines that Start
failed.
However, the VM is still correctly started, and by the time the user follows up with alpine ssh ...
, qemu-system-...
has completed startup, the pidfile has been created, and the ssh server is ready.
Whatever the resolution is to #78 (either no change or some change), this Retry
loop should instead await for one of two things to happen:
qemu-system-...
exits (likely with an error/after receiving a signal)alpine.pid
is created (whichqemu-system-
seems to do late in the boot process)
If 1 occurs, the VM surely did not start correctly and the pidfile should be missing (TODO: check if pidfile is cleaned up after sigkill
).
If 2 occurs, the VM likely started correctly.
This way, Start
can wait asynchronously for bootup to finish, and then proceed when ready after a host-system-speed dependent amount of time. Instead, on slow systems, Retry
will spend many seconds sleeping after multiple failures, and on fast systems, Retry
will spend some seconds sleeping after a few failures -- in both cases waiting precious cycles and time.
Is this related to what I'm seeing right now with over 10 minutes of dots?
$ alpine start simplistic-background
2023/02/28 19:56:40 Booting...................
It's 20:07:XX as of positing this.
Is there a way to verbose output the booting? I'm worried my vm has been corrupted some how.
It's not. I mean, it kind of is, but not really.
Likely, Start
(the thing that boots the VM) is trying to SSH into the VM to run a command as part of the boot process. That SSH is failing in a Retry
loop that waits a long time after failures. That is part of what this issue plans to fix, but it's not the root cause of your bug.
- In
~/.macpine/simplistic-background/config.yaml
, aresshuser
andsshpassword
correct? - Is the
root
password still the default (root
)? - Is
sshport
correct and is anything on your host system listening on that port already? - Did you modify
/etc/ssh/sshd_config
inside the VM at all?
I don't want to take away from the original problem, but I waited it out and got this
2023/02/28 20:14:00 unable to sync clocks: after 10 attempts, last error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none password], no supported methods remain
but checking
In ~/.macpine/simplistic-background/config.yaml, are sshuser and sshpassword correct?
was the correct fix.