myoung34 / docker-github-actions-runner

This will run the new self-hosted github actions runners with docker-in-docker

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Deregistration doesn't appear to be working

pabloromeo opened this issue · comments

It would appear that the logic for deregistering isn't detecting the signal docker swarm is sending to the container.
The only thing I'm doing below is scaling the service to 1 replica, and once it's listening for jobs, scaling it down to 0 replicas.
But the Runners in github are not removed and just accumulate.

Here are the logs for the startup as well as shutdown:

2021-09-08 19:59:17	Runner reusage is disabled
2021-09-08 19:59:17	Obtaining the token of the runnet
2021-09-08 19:59:17	Configuring
2021-09-08 19:59:18	--------------------------------------------------------------------------------
2021-09-08 19:59:18	|        ____ _ _   _   _       _          _        _   _                      |
2021-09-08 19:59:18	|       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
2021-09-08 19:59:18	|      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
2021-09-08 19:59:18	|      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
2021-09-08 19:59:18	|       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
2021-09-08 19:59:18	|                                                                              |
2021-09-08 19:59:18	|                       Self-hosted runner registration                        |
2021-09-08 19:59:18	|                                                                              |
2021-09-08 19:59:18	--------------------------------------------------------------------------------
2021-09-08 19:59:18	# Authentication
2021-09-08 19:59:21	√ Connected to GitHub
2021-09-08 19:59:23	# Runner Registration
2021-09-08 19:59:24	√ Runner successfully added
2021-09-08 19:59:27	√ Runner connection is good
2021-09-08 19:59:27	# Runner settings
2021-09-08 19:59:27	√ Settings Saved.
2021-09-08 19:59:27	.path=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/actions-runner
2021-09-08 19:59:27	Starting Runner listener with startup type: service
2021-09-08 19:59:27	Started listener process, pid: 98
2021-09-08 19:59:27	Started running service
2021-09-08 19:59:31	√ Connected to GitHub
2021-09-08 19:59:32	Listening for Jobs
2021-09-08 19:59:49	Shutting down runner listener
2021-09-08 19:59:49	Sending SIGINT to runner listener to stop
2021-09-08 19:59:49	Sending SIGKILL to runner listener
2021-09-08 19:59:49	Shutting down runner listener
2021-09-08 19:59:49	Sending SIGINT to runner listener to stop
2021-09-08 19:59:49	Sending SIGKILL to runner listener
2021-09-08 19:59:49	Exiting...
2021-09-08 19:59:49	Exiting...
2021-09-08 19:59:50	Runner listener exited with error code 143
2021-09-08 19:59:50	Runner listener exit with undefined return code, re-launch runner in 5 seconds.

It is being run with the following stack:

version: '3.4'

services:
  actions_runner:
    image: myoung34/github-runner:ubuntu-bionic
    environment:
      ACCESS_TOKEN: <redacted>
      RUNNER_NAME_PREFIX: actions_runner
      RUNNER_SCOPE: repo
      REPO_URL: <redacted>
      RUNNER_WORKDIR: "/path/to/workdir/"
    volumes:
      - /path/to/workdir:/path/to/workdir
      - /var/run/docker.sock:/var/run/docker.sock
    deploy:
      mode: replicated
      replicas: 1
      update_config:
        order: stop-first

I've also tried manually going into the container and doing kill -9 1 but i'm still not able to trigger the deregistration flow, unfortunately. Any ideas?

This should be resolved in #135 , ill try to revive that and test

Ah, I see. Yeah, the odd thing is, that if it were related to the token stuff, i would have expected the "Caught SIGTERM. Deregistering runner" message in the logs or an attempt to deregister it somewhere.

@pabloromeo, please try out the change in PR #141 and let us know if it resolves the issue for you.

@pabloromeo if #141 solves it for you im happy to merge it as-is

Unfortunately the new version of entrypoint.sh didn't make a difference, still no deregistration when container is stopped.

image

Logs look exactly the same as before. The signal is killing the listener but the trap configured for deregistration is never invoked.

I've even manually logged in to the container, and issued a kill -SIGTERM 1 and it also killed the listener but didn't trigger the deregistration logic. It would appear the traps aren't working.

this should be fixed in master now

√ Connected to GitHub

2021-09-15 15:26:32Z: Listening for Jobs
^CExiting...
Caught SIGTERM. Deregistering runner

# Runner removal


√ Runner removed successfully
√ Removed .credentials
√ Removed .runner

Worked like a charm! Excellent work :) Now i can run replicated runners within an orchestrator, with self-cleanup.