myoung34 / docker-github-actions-runner

This will run the new self-hosted github actions runners with docker-in-docker

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

matrix docker (container) builds fail

splitice opened this issue · comments

@myoung34 I suspect this is a Github Actions bug. Are you able to confirm (and I'll direct a issue to them accordingly).

name: Build
on:
  push:
jobs:
  build:
    runs-on: [self-hosted,linux]
    container: debian:buster
    strategy:
      matrix:
        build:
        - name: debug
        - name: production
[...]

Fails to start:

  /usr/bin/docker start ce81cdc1a1979d041cd6f5ae1fd5623a9a63ae09c1cac85b897a1f6059b56e07
  Error response from daemon: network github_network_4684e53698f34574a4cc3c670171f70c not found
  Error: failed to start containers: ce81cdc1a1979d041cd6f5ae1fd5623a9a63ae09c1cac85b897a1f6059b56e07
  Error: Docker start fail with exit code 1

The strategy of matrix appears to be the issue but I can't see why.

Feedback apreciated.

@splitice it worked for me here: https://github.com/OctoKode/test1/runs/3767512089?check_suite_focus=true

name: Test

on: [push]

jobs:
  test:
    runs-on: self-hosted
    strategy:
      matrix:
        build:
        - name: debug
        - name: production
    steps:
    - uses: actions/checkout@v1
    - name: ls
      run: ls -alh; pwd
    - name: verify codegen
      run: ./test.sh

Try with container:

https://github.com/OctoKode/test1/runs/3767922670?check_suite_focus=true

Error: Container feature is not supported when runner is already running inside container. which is the limitation of DinD that github doesn't support

Whatever you're running into is on the linux agent not the self-hosted

name: Test

on: [push]

jobs:
  test:
    runs-on: self-hosted
    container: debian:buster
    strategy:
      matrix:
        build:
        - name: debug
        - name: production
    steps:
    - uses: actions/checkout@v1
    - name: ls
      run: ls -alh; pwd
    - name: verify codegen
      run: ./test.sh

Container feature does work with the configuration I provided without the matrix. In either self hosted or github runner mode. It's only in self-hosed mode with this docker runner that matrix + container fails (I have no data on matrix + container without dind).

This is the error you are chasing (taken from our failing action).

Anyway container works in dind (with all apropriate docker in docker permissions).

Configuration:

[...]
jobs:
  build:
    runs-on: [self-hosted,linux]
    container: debian:buster
    steps:
[...]

Adding matrix to the configuration does not.

Offtopic FYI DIND nesting also works (Yocto does docker in docker within the job we fire off from Github Actions - in docker). But that's not using GHA containers.

Using

name: Test

on: [push]

jobs:
  test:
    runs-on: self-hosted
    container: debian:buster
    strategy:
      matrix:
        build:
        - name: debug
        - name: production
    steps:
    - uses: actions/checkout@v1
    - name: ls
      run: ls -alh; pwd
    - name: verify codegen
      run: ./test.sh

fails as expected

Adding linux to runs-on: shouldnt make this feature work. I think you might have found an actions bug that should be reported upstream, but this shouldnt work without linux in runs-on so I'm not sure what can be done from the self hosted runner side. The runner is working as expected within the DinD limitation

actions/runner#406
actions/runner#367

I've run into this issue as well, and the problem seems to be that the first step it does when it spins up the services is to prune existing data with the same label.

Each node in the matrix spins up with the same label, and the first one will prune and create a network, then the second one spins up and prunes again (with the SAME LABEL) and now the network doesn't exist anymore, and they fail.

There must be a way to set a unique label per matrix node to isolate the services and prevent them from pruning eachother when they start up

Update: This is a really stupid hack, but this is how I worked around it:

  1. Download and set up runner resources in a temporary directory
  2. Create a randomized string and then create a directory e.g /home/github/runner-root-$RANDOM_STRING
  3. Configure and start the runner from there

I found that the Github runner uses the sha256sum of the string of its root directory (https://github.com/actions/runner/blob/42fe704132b7fc343e59ff15a8dbaace19e38e62/src/Runner.Worker/Container/DockerCommandManager.cs#L49) to create the docker label. This string is the same for every container where that path is the same, which means the label is the same for all containers and they will prune each other when they start up as matrix nodes.

In my Dockerfile, I have this:

# Create user and cd to $HOME
USER github
WORKDIR /home/github

# Create a temporary directory for all runner resources and extract the runner code into that directory
RUN mkdir -p runner-resources
RUN GITHUB_RUNNER_VERSION=$(curl --silent "https://api.github.com/repos/actions/runner/releases/latest" | jq -r '.tag_name[1:]') \
    && curl -Ls https://github.com/actions/runner/releases/download/v${GITHUB_RUNNER_VERSION}/actions-runner-linux-x64-${GITHUB_RUNNER_VERSION}.tar.gz | tar xz -C runner-resources \
    && sudo ./runner-resources/bin/installdependencies.sh

# Copy custom initialize script to $HOME
COPY --chown=github:github initialize.sh ./
RUN chmod u+x ./initialize.sh

# Copy all custom runner resources into runner-resources
COPY --chown=github:github entrypoint.sh runsvc.sh ./runner-resources/
RUN sudo chmod u+x ./runner-resources/entrypoint.sh ./runner-resources/runsvc.sh

# Start using my custom initialization script to randomize the runner root
ENTRYPOINT ["/home/github/initialize.sh"]

This is what my init script looks like:

#!/bin/sh

RANDOM_STRING=$(openssl rand -hex 6)
mkdir -p runner-root-$RANDOM_STRING

# Create a random runner root name to ensure docker labels are random and unique per container.
# We would not need this if the containers didn't share the same docker instance
cp -rf runner-resources/* runner-root-$RANDOM_STRING
cd runner-root-$RANDOM_STRING && ./entrypoint.sh

Now the runner will start from a directory that looks like /home/github/runner-root-abc123 where the last 6 digits are random, and will create different labels for networks and containers in docker, preventing them from pruning eachother.

Probably not the best way to deal with this, but seems to work