Deregistration doesn't appear to be working
pabloromeo opened this issue · comments
It would appear that the logic for deregistering isn't detecting the signal docker swarm is sending to the container.
The only thing I'm doing below is scaling the service to 1 replica, and once it's listening for jobs, scaling it down to 0 replicas.
But the Runners in github are not removed and just accumulate.
Here are the logs for the startup as well as shutdown:
2021-09-08 19:59:17 Runner reusage is disabled
2021-09-08 19:59:17 Obtaining the token of the runnet
2021-09-08 19:59:17 Configuring
2021-09-08 19:59:18 --------------------------------------------------------------------------------
2021-09-08 19:59:18 | ____ _ _ _ _ _ _ _ _ |
2021-09-08 19:59:18 | / ___(_) |_| | | |_ _| |__ / \ ___| |_(_) ___ _ __ ___ |
2021-09-08 19:59:18 | | | _| | __| |_| | | | | '_ \ / _ \ / __| __| |/ _ \| '_ \/ __| |
2021-09-08 19:59:18 | | |_| | | |_| _ | |_| | |_) | / ___ \ (__| |_| | (_) | | | \__ \ |
2021-09-08 19:59:18 | \____|_|\__|_| |_|\__,_|_.__/ /_/ \_\___|\__|_|\___/|_| |_|___/ |
2021-09-08 19:59:18 | |
2021-09-08 19:59:18 | Self-hosted runner registration |
2021-09-08 19:59:18 | |
2021-09-08 19:59:18 --------------------------------------------------------------------------------
2021-09-08 19:59:18 # Authentication
2021-09-08 19:59:21 √ Connected to GitHub
2021-09-08 19:59:23 # Runner Registration
2021-09-08 19:59:24 √ Runner successfully added
2021-09-08 19:59:27 √ Runner connection is good
2021-09-08 19:59:27 # Runner settings
2021-09-08 19:59:27 √ Settings Saved.
2021-09-08 19:59:27 .path=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/actions-runner
2021-09-08 19:59:27 Starting Runner listener with startup type: service
2021-09-08 19:59:27 Started listener process, pid: 98
2021-09-08 19:59:27 Started running service
2021-09-08 19:59:31 √ Connected to GitHub
2021-09-08 19:59:32 Listening for Jobs
2021-09-08 19:59:49 Shutting down runner listener
2021-09-08 19:59:49 Sending SIGINT to runner listener to stop
2021-09-08 19:59:49 Sending SIGKILL to runner listener
2021-09-08 19:59:49 Shutting down runner listener
2021-09-08 19:59:49 Sending SIGINT to runner listener to stop
2021-09-08 19:59:49 Sending SIGKILL to runner listener
2021-09-08 19:59:49 Exiting...
2021-09-08 19:59:49 Exiting...
2021-09-08 19:59:50 Runner listener exited with error code 143
2021-09-08 19:59:50 Runner listener exit with undefined return code, re-launch runner in 5 seconds.
It is being run with the following stack:
version: '3.4'
services:
actions_runner:
image: myoung34/github-runner:ubuntu-bionic
environment:
ACCESS_TOKEN: <redacted>
RUNNER_NAME_PREFIX: actions_runner
RUNNER_SCOPE: repo
REPO_URL: <redacted>
RUNNER_WORKDIR: "/path/to/workdir/"
volumes:
- /path/to/workdir:/path/to/workdir
- /var/run/docker.sock:/var/run/docker.sock
deploy:
mode: replicated
replicas: 1
update_config:
order: stop-first
I've also tried manually going into the container and doing kill -9 1 but i'm still not able to trigger the deregistration flow, unfortunately. Any ideas?
Ah, I see. Yeah, the odd thing is, that if it were related to the token stuff, i would have expected the "Caught SIGTERM. Deregistering runner" message in the logs or an attempt to deregister it somewhere.
@pabloromeo, please try out the change in PR #141 and let us know if it resolves the issue for you.
@pabloromeo if #141 solves it for you im happy to merge it as-is
Unfortunately the new version of entrypoint.sh didn't make a difference, still no deregistration when container is stopped.
Logs look exactly the same as before. The signal is killing the listener but the trap configured for deregistration is never invoked.
I've even manually logged in to the container, and issued a kill -SIGTERM 1
and it also killed the listener but didn't trigger the deregistration logic. It would appear the traps aren't working.
this should be fixed in master now
√ Connected to GitHub
2021-09-15 15:26:32Z: Listening for Jobs
^CExiting...
Caught SIGTERM. Deregistering runner
# Runner removal
√ Runner removed successfully
√ Removed .credentials
√ Removed .runner
Worked like a charm! Excellent work :) Now i can run replicated runners within an orchestrator, with self-cleanup.