matrix docker (container) builds fail
splitice opened this issue · comments
@myoung34 I suspect this is a Github Actions bug. Are you able to confirm (and I'll direct a issue to them accordingly).
name: Build
on:
push:
jobs:
build:
runs-on: [self-hosted,linux]
container: debian:buster
strategy:
matrix:
build:
- name: debug
- name: production
[...]
Fails to start:
/usr/bin/docker start ce81cdc1a1979d041cd6f5ae1fd5623a9a63ae09c1cac85b897a1f6059b56e07
Error response from daemon: network github_network_4684e53698f34574a4cc3c670171f70c not found
Error: failed to start containers: ce81cdc1a1979d041cd6f5ae1fd5623a9a63ae09c1cac85b897a1f6059b56e07
Error: Docker start fail with exit code 1
The strategy of matrix appears to be the issue but I can't see why.
Feedback apreciated.
@splitice it worked for me here: https://github.com/OctoKode/test1/runs/3767512089?check_suite_focus=true
name: Test
on: [push]
jobs:
test:
runs-on: self-hosted
strategy:
matrix:
build:
- name: debug
- name: production
steps:
- uses: actions/checkout@v1
- name: ls
run: ls -alh; pwd
- name: verify codegen
run: ./test.sh
Try with container:
https://github.com/OctoKode/test1/runs/3767922670?check_suite_focus=true
Error: Container feature is not supported when runner is already running inside container.
which is the limitation of DinD that github doesn't support
Whatever you're running into is on the linux agent not the self-hosted
name: Test
on: [push]
jobs:
test:
runs-on: self-hosted
container: debian:buster
strategy:
matrix:
build:
- name: debug
- name: production
steps:
- uses: actions/checkout@v1
- name: ls
run: ls -alh; pwd
- name: verify codegen
run: ./test.sh
Container feature does work with the configuration I provided without the matrix. In either self hosted or github runner mode. It's only in self-hosed mode with this docker runner that matrix + container fails (I have no data on matrix + container without dind).
This is the error you are chasing (taken from our failing action).
Anyway container works in dind (with all apropriate docker in docker permissions).
Configuration:
[...]
jobs:
build:
runs-on: [self-hosted,linux]
container: debian:buster
steps:
[...]
Adding matrix to the configuration does not.
Offtopic FYI DIND nesting also works (Yocto does docker in docker within the job we fire off from Github Actions - in docker). But that's not using GHA containers.
Using
name: Test
on: [push]
jobs:
test:
runs-on: self-hosted
container: debian:buster
strategy:
matrix:
build:
- name: debug
- name: production
steps:
- uses: actions/checkout@v1
- name: ls
run: ls -alh; pwd
- name: verify codegen
run: ./test.sh
fails as expected
Adding linux
to runs-on:
shouldnt make this feature work. I think you might have found an actions bug that should be reported upstream, but this shouldnt work without linux
in runs-on so I'm not sure what can be done from the self hosted runner side. The runner is working as expected within the DinD limitation
I've run into this issue as well, and the problem seems to be that the first step it does when it spins up the services is to prune existing data with the same label.
Each node in the matrix spins up with the same label, and the first one will prune and create a network, then the second one spins up and prunes again (with the SAME LABEL) and now the network doesn't exist anymore, and they fail.
There must be a way to set a unique label per matrix node to isolate the services and prevent them from pruning eachother when they start up
Update: This is a really stupid hack, but this is how I worked around it:
- Download and set up runner resources in a temporary directory
- Create a randomized string and then create a directory e.g
/home/github/runner-root-$RANDOM_STRING
- Configure and start the runner from there
I found that the Github runner uses the sha256sum of the string of its root directory (https://github.com/actions/runner/blob/42fe704132b7fc343e59ff15a8dbaace19e38e62/src/Runner.Worker/Container/DockerCommandManager.cs#L49) to create the docker label. This string is the same for every container where that path is the same, which means the label is the same for all containers and they will prune each other when they start up as matrix nodes.
In my Dockerfile
, I have this:
# Create user and cd to $HOME
USER github
WORKDIR /home/github
# Create a temporary directory for all runner resources and extract the runner code into that directory
RUN mkdir -p runner-resources
RUN GITHUB_RUNNER_VERSION=$(curl --silent "https://api.github.com/repos/actions/runner/releases/latest" | jq -r '.tag_name[1:]') \
&& curl -Ls https://github.com/actions/runner/releases/download/v${GITHUB_RUNNER_VERSION}/actions-runner-linux-x64-${GITHUB_RUNNER_VERSION}.tar.gz | tar xz -C runner-resources \
&& sudo ./runner-resources/bin/installdependencies.sh
# Copy custom initialize script to $HOME
COPY --chown=github:github initialize.sh ./
RUN chmod u+x ./initialize.sh
# Copy all custom runner resources into runner-resources
COPY --chown=github:github entrypoint.sh runsvc.sh ./runner-resources/
RUN sudo chmod u+x ./runner-resources/entrypoint.sh ./runner-resources/runsvc.sh
# Start using my custom initialization script to randomize the runner root
ENTRYPOINT ["/home/github/initialize.sh"]
This is what my init script looks like:
#!/bin/sh
RANDOM_STRING=$(openssl rand -hex 6)
mkdir -p runner-root-$RANDOM_STRING
# Create a random runner root name to ensure docker labels are random and unique per container.
# We would not need this if the containers didn't share the same docker instance
cp -rf runner-resources/* runner-root-$RANDOM_STRING
cd runner-root-$RANDOM_STRING && ./entrypoint.sh
Now the runner will start from a directory that looks like /home/github/runner-root-abc123
where the last 6 digits are random, and will create different labels for networks and containers in docker, preventing them from pruning eachother.
Probably not the best way to deal with this, but seems to work