gaia-app / gaia

Gaia is a Terraform 🌍 UI for your modules, and self-service infrastructure 👨‍💻

Home Page:https://gaia-app.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

🐛Caused by: java.io.IOException: native connect() failed : No such file or directory

henrikmotzkus opened this issue · comments

Describe the bug
Getting "Caused by: java.io.IOException: native connect() failed : No such file or directory" when I try to run a stack

Gaia on Azure AKS

To Reproduce

az login
az account set --subscription $subscriptionid
az group create --resource-group $resourcegroup --location $location
az aks create --resource-group $resourcegroup --name $clusterName --node-count 1 --enable-addons monitoring --generate-ssh-keys
az aks get-credentials --resource-group $resourcegroup --name $clusterName
kubectl apply -f https://raw.githubusercontent.com/henrikmotzkus/AutomationDemo/main/12_Terraform_Gaia/gaia.yaml

Expected behavior
no error

Screenshots
2022-01-06 17:29:09.074 INFO 1 --- [ scheduling-1] io.gaia_app.runner.StepPoller : Step found ffbd90c5-f8ff-4147-ace3-c93ea2cedd1e. Running.
2022-01-06 17:29:09.757 INFO 1 --- [ gaia-runner-1] io.gaia_app.runner.StepRunner : Starting step ffbd90c5-f8ff-4147-ace3-c93ea2cedd1e execution.
2022-01-06 17:29:10.554 ERROR 1 --- [tream-757286532] c.g.d.api.async.ResultCallbackTemplate : Error during callback

java.io.UncheckedIOException: Error while executing Request{method=POST, path=/images/create?fromImage=hashicorp%2Fterraform%3Alatest, body=null, bodyBytes=null, hijackedInput=null, headers={accept=application/octet-stream, content-type=application/json}}
at com.github.dockerjava.okhttp.OkDockerHttpClient.execute(OkDockerHttpClient.java:233) ~[docker-java-transport-okhttp-3.2.7.jar!/:na]
at com.github.dockerjava.core.DefaultInvocationBuilder.execute(DefaultInvocationBuilder.java:228) ~[docker-java-core-3.2.7.jar!/:na]
at com.github.dockerjava.core.DefaultInvocationBuilder.lambda$executeAndStream$1(DefaultInvocationBuilder.java:269) ~[docker-java-core-3.2.7.jar!/:na]
at java.base/java.lang.Thread.run(Thread.java:832) ~[na:na]
Caused by: java.io.IOException: native connect() failed : No such file or directory
at com.github.dockerjava.okhttp.UnixDomainSocket.connect(UnixDomainSocket.java:157) ~[docker-java-transport-okhttp-3.2.7.jar!/:na]
at com.github.dockerjava.okhttp.UnixSocketFactory$1.connect(UnixSocketFactory.java:29) ~[docker-java-transport-okhttp-3.2.7.jar!/:na]
at okhttp3.internal.platform.Platform.connectSocket(Platform.java:130) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:263) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:183) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:229) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.RealCall.execute(RealCall.java:81) ~[okhttp-3.14.9.jar!/:na]
at com.github.dockerjava.okhttp.OkDockerHttpClient$OkResponse.(OkDockerHttpClient.java:256) ~[docker-java-transport-okhttp-3.2.7.jar!/:na]
at com.github.dockerjava.okhttp.OkDockerHttpClient.execute(OkDockerHttpClient.java:230) ~[docker-java-transport-okhttp-3.2.7.jar!/:na]
... 3 common frames omitted

2022-01-06 17:29:10.556 ERROR 1 --- [ gaia-runner-1] .a.i.SimpleAsyncUncaughtExceptionHandler : Unexpected exception occurred invoking async method: public void io.gaia_app.runner.StepRunner.runStep(io.gaia_app.runner.RunnerStep)

java.io.UncheckedIOException: Error while executing Request{method=POST, path=/images/create?fromImage=hashicorp%2Fterraform%3Alatest, body=null, bodyBytes=null, hijackedInput=null, headers={accept=application/octet-stream, content-type=application/json}}
at com.github.dockerjava.okhttp.OkDockerHttpClient.execute(OkDockerHttpClient.java:233) ~[docker-java-transport-okhttp-3.2.7.jar!/:na]
at com.github.dockerjava.core.DefaultInvocationBuilder.execute(DefaultInvocationBuilder.java:228) ~[docker-java-core-3.2.7.jar!/:na]
at com.github.dockerjava.core.DefaultInvocationBuilder.lambda$executeAndStream$1(DefaultInvocationBuilder.java:269) ~[docker-java-core-3.2.7.jar!/:na]
at java.base/java.lang.Thread.run(Thread.java:832) ~[na:na]
Caused by: java.io.IOException: native connect() failed : No such file or directory
at com.github.dockerjava.okhttp.UnixDomainSocket.connect(UnixDomainSocket.java:157) ~[docker-java-transport-okhttp-3.2.7.jar!/:na]
at com.github.dockerjava.okhttp.UnixSocketFactory$1.connect(UnixSocketFactory.java:29) ~[docker-java-transport-okhttp-3.2.7.jar!/:na]
at okhttp3.internal.platform.Platform.connectSocket(Platform.java:130) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:263) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:183) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:229) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.RealCall.execute(RealCall.java:81) ~[okhttp-3.14.9.jar!/:na]
at com.github.dockerjava.okhttp.OkDockerHttpClient$OkResponse.(OkDockerHttpClient.java:256) ~[docker-java-transport-okhttp-3.2.7.jar!/:na]
at com.github.dockerjava.okhttp.OkDockerHttpClient.execute(OkDockerHttpClient.java:230) ~[docker-java-transport-okhttp-3.2.7.jar!/:na]
... 3 common frames omitted

2022-01-06 17:29:14.079 INFO 1 --- [ scheduling-1] io.gaia_app.runner.StepPoller : Polling for pending steps

Hey @henrikmotzkus 👋

Thank you for opening this detailed issue.
As for now, the Gaia runner cannot run on Kubernetes, as it needs direct access to a docker daemon, hence the error.
We're working on native kubernetes support these days, I hope we can release a version of the runner that supports kubernetes in the next few weeks.

I'll notify you in this issue when the feature will be available in the runner.

FYI: @juwit @henrikmotzkus , might have made it work with this manifest, exposing the docker socket to the container on kubernetes (will still do changes to the manifest, got it to connect to the socket at least):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gaia-runner
  labels:
    app: gaia-runner
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gaia-runner
  template:
    metadata:
      labels:
        app: gaia-runner
    spec:
      containers:
      - name: gaia-runner
        image: gaiaapp/runner:v2.2.0
        ports:
        - containerPort: 8080
        env:
        - name: GAIA_URL
          value: "http://gaia:8080"
        - name: GAIA_RUNNER_API_PASSWORD
          value: "123456"
        volumeMounts:
          - name: dockersock
            mountPath: "/var/run/docker.sock"
      volumes:
      - name: dockersock
        hostPath:
          path: /var/run/docker.sock            

Hello @amitai-devops

Just out of interest, which Kubernetes API version are you using? After 1.20 the usage of the docker runtime have been removed

Regards

@candidson True, I am using 1.18 on EKS. Regardless I could have used Docker-In-Docker to forcefully run inside a docker runtime. I do not know if the Kubernetes change will affect this, but good pointing it out

Hello @amitai-devops
From my experience, d-i-d wouldn't work directly since you do not have access to any docker socket anymore.. perhaps having docker running in oci based image, and in which gaia-runner would be hosted, might work. However the current gaia-runner code expects a docker socket as well. Then again, this wouldn't work.
I was working on rewriting the gaia-runner code to use the podman api for example, however I understood from @juwit that he is working on an even better solution, leveraging the Kubernetes native APIs directly: gaia-app/runner#56

@candidson I am waiting for a solution as I also want to solve this. I've actually gone pretty far as to make the runner spin up a container, but i'm having trouble with the java-docker as it doesn't resolve addresses, as if it doesn't have any access to a DNS server, even after switching multiple terraform docker images in the module:

[gaia] using image kjmkznr/terraform:latest
[gaia] installing curl
ERROR: https://dl-cdn.alpinelinux.org/alpine/v3.15/main: temporary error (try again later)
ERROR: https://dl-cdn.alpinelinux.org/alpine/v3.15/community: temporary error (try again later)
[gaia] cloning https://github.com/terraform-aws-modules/terraform-aws-ec2-instance
Cloning into 'module'...
fatal: unable to access 'https://github.com/terraform-aws-modules/terraform-aws-ec2-instance/': Could not resolve host: github.com

@candidson I am waiting for a solution as I also want to solve this. I've actually gone pretty far as to make the runner spin up a container, but i'm having trouble with the java-docker as it doesn't resolve addresses, as if it doesn't have any access to a DNS server, even after switching multiple terraform docker images in the module:

[gaia] using image kjmkznr/terraform:latest
[gaia] installing curl
ERROR: https://dl-cdn.alpinelinux.org/alpine/v3.15/main: temporary error (try again later)
ERROR: https://dl-cdn.alpinelinux.org/alpine/v3.15/community: temporary error (try again later)
[gaia] cloning https://github.com/terraform-aws-modules/terraform-aws-ec2-instance
Cloning into 'module'...
fatal: unable to access 'https://github.com/terraform-aws-modules/terraform-aws-ec2-instance/': Could not resolve host: github.com

Hey 👋

Yes, it seems that the containers that the runner spins-up can't access the internet.
It may be related to a network limitation on your cluster, or on the docker host.

I you use the underlying docker daemon of a kubernetes cluster, the daemon will probably not be configured to have a bridge network to the host, so it may explain the issue. In that case, you may have to create this network.

Can you try to run the following commands on your docker host:

Test docker networks:

docker network ls

expected output (bridge network is important)

NETWORK ID     NAME               DRIVER    SCOPE
0f096fefbc9f   bridge             bridge    local
d20d9f4727bf   host               host      local
3a91ee5ac87c   none               null      local

If the bridge network doesn't exists, you may try:

docker network create -d bridge bridge-network

Test DNS configuration

 docker run --rm -it alpine cat /etc/resolv.conf

expected output (with IP depending on your DHCP configuration)

nameserver 192.168.1.1

Hope it helps diagnose the issue

@juwit You were right, Using a similar solution to yours I was able to make the Gaia runner work on Kubernetes.
FYI: @henrikmotzkus @candidson
Steps:

  1. On your kubernetes node, run the following commands to create a bridge network:
cp /etc/docker/daemon.json /etc/docker/daemon_backup.json
echo -e '.bridge="docker0" | ."live-restore"=false' >  /etc/docker/jq_script
jq -f /etc/docker/jq_script /etc/docker/daemon_backup.json | tee /etc/docker/daemon.json
systemctl restart docker
  1. Deploy the runner manifest, and connect the pod to the "host network" of the kubernetes node:
    • The gaia URL has to be reached from the kubernetes node
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gaia-runner
  labels:
    app: gaia-runner
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gaia-runner
  template:
    metadata:
      labels:
        app: gaia-runner
      annotations:        
        sidecar.istio.io/inject: "false" # remove all sorts of service mesh configurations that could interfere
    spec:
      hostNetwork: true # note this line
      containers:
      - name: gaia-runner
        image: gaiaapp/runner:v2.2.0
        ports:
        - containerPort: 8080
        env:
        - name: GAIA_URL
          value: "https://gaia.your.url"
        - name: GAIA_RUNNER_API_PASSWORD
          value: "123456"                         
        volumeMounts:
          - name: dockersock
            mountPath: "/var/run/docker.sock"
      volumes:
      - name: dockersock
        hostPath:
          path: /var/run/docker.sock

See Gaia Runner output when running a stack:

[gaia] using image hashicorp/terraform:latest
[gaia] installing curl
[gaia] cloning https://github.com/terraform-aws-modules/terraform-aws-ec2-instance
Cloning into 'module'...
[gaia] generating backend configuration
[gaia] generating tfvars variable file
[gaia] running terraform init
Terraform v1.1.3
on linux_amd64

Initializing the backend...

Successfully configured the backend "http"! Terraform will automatically
use this backend unless the backend configuration changes.

Initializing provider plugins...
- Finding hashicorp/aws versions matching ">= 3.72.0"...
- Installing hashicorp/aws v3.72.0...
- Installed hashicorp/aws v3.72.0 (signed by HashiCorp)

Great !
This kind of workaround will not be necessary with the kubernetes runner planned for the next runner release.
I'll close this issue.

Hey there 👋

We've implemented the the Kubernetes executor in the latest Runner version (2.3.0). I think this will help you, as the Runner does not need a docker engine anymore, but can directly interact with the Kubernetes API.

Here are some links: