Root Daemon: Not running
phooijenga opened this issue · comments
Describe the bug
The telepresence root daemon does not start. The daemon.log file is empty.
To Reproduce
- Run
telepresence connect
. It tells me it needs root privileges, which I provide:$ telepresence connect Launching Telepresence Root Daemon Need root privileges to run: /Users/paul/bin/telepresence daemon-foreground /Users/paul/Library/Logs/telepresence '/Users/paul/Library/Application Support/telepresence'
- Run
telepresence status
, observe that root daemon is not running$ telepresence status OSS User Daemon: Running Version : v2.18.0 Executable : /Users/paul/bin/telepresence Install ID : b4931622-dabf-46bc-8218-266d4b782476 Status : Connected Kubernetes server : https://198.19.249.184:6443 Kubernetes context: founda-k3s-1 Namespace : apps Manager namespace : ambassador Intercepts : 0 total Root Daemon: Not running OSS Traffic Manager: Connected Version : v2.18.0 Traffic Agent: docker.io/datawire/tel2:2.18.0
I don't see the daemon-foreground in ps output. When I run it manually it doesn't seem to crash (and writes a startup message to daemon.log
), but telepresence status
still reports 'not running'. It does create a /var/run/telepresence-daemon.socket
.
Expected behavior
A clear and concise description of what you expected to happen.
Versions (please complete the following information):
OSS Client : v2.18.0
OSS Root Daemon : v2.18.0
OSS User Daemon : v2.18.0
OSS Traffic Manager: v2.18.0
Traffic Agent : docker.io/datawire/tel2:2.18.0
macOS Sonoma 14.4.1 (23E224)
Additional context
It appears as if this issue started happening after upgrading to macOS Sonoma 14.4.1.
Is this amd64 or arm64 (M1)?
M1.
$ arch
arm64
$ file `which telepresence`
/Users/paul/bin/telepresence: Mach-O 64-bit executable arm64
I did some debugging, and it turns out that EnsureUserDaemon
swallows the error returned by ensureRootDaemonRunning
here.
In my case, the error is "daemon service did not start: timeout while waiting for daemon to start", which unfortunately does not tell us anything new.
So, it turns out that this system has timestamp_timeout=0
configured, and running sudo true
doesn't actually do anything.
Apparently timestamp_timeout=0
is now company policy, so I can't simply change it.
Alright, to wrap this all up: if I manually start the root daemon with sudo before running telepresence connect
, it works.
Thanks for the info. Any ideas on how we can improve how this is handled in Telepresence?
I think not hiding the error is a good start (#3559), but I'm not sure if the underlying problem can be solved completely. Maybe ensureRootDaemonRunning
could check if the process is still alive as well as trying to connect to the socket. That way the user wouldn't have to wait the full 10 seconds to be told the daemon failed to start.
Another possibility (which I've not extensively tested) might be to run sudo --list
(instead of sudo true
) and check for timestamp_timeout=0
in the output. If it's there, telepresence
can instruct the user how to run the daemon themself.
Another possibility would be to use sudo --non-interactive --no-update --validate
to check if the user's cached credentials are valid (or no authentication is required) twice, once before prompting (instead of the current sudo --non-interactive true
) and once again after to make sure the credentials are indeed cached.
It looks like the error display has been addressed. I'll leave this open as a feature request for the process check suggestions.