chrislusf / gleam

Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

K8s executor failed to start: "no such file or directory"

dominicfollett opened this issue · comments

commented

Hi there, I am familiarizing myself with gleam because I want to contribute to the project.

I've deployed gleam on a local Docker Desktop K8s cluster. On a separate pod I run the word-count demo with:

go run main.go --distributed=true

(I have a test file on the pod that the demo is using)
However I see a lot of messages "Failed 1 time to start", "Failed 2 times to start" .... with the following error

2024/04/25 16:36:13 Start Job Status URL http://master:45326/job/4191823757
2024/04/25 16:36:13 10.1.0.68:45327 1:0> starting with 1 MB memory...
2024/04/25 16:36:14 10.1.0.66:45327 2:2-3:2-4:2-5:2-6:2> starting with 0 MB memory...
2024/04/25 16:36:15 10.1.0.66:45327 2:2-3:2-4:2-5:2-6:2>2024/04/25 16:36:15 Failed 1 time to start /data/4191823757/main [/data/4191823757/main --distributed=true -gleam.mapper m1 -gleam.executor [::]:37577 -flow.hashcode 4191823757 -flow.stepId 2 -flow.taskId 2] []:
Start error fork/exec /data/4191823757/main: no such file or directory:&{/data/4191823757/main [/data/4191823757/main --distributed=true -gleam.mapper m1 -gleam.executor [::]:37577 -flow.hashcode 4191823757 -flow.stepId 2 -flow.taskId 2] [] /data/4191823757 0xc000260000 0xc00024c040 0xc00009c010 [] <nil> <nil> <nil> 0xc0000c4dc0 <nil> false [0xc000260018 0xc000260040 0xc00009c010] [0xc000260018 0xc000260040] [0xc000260020 0xc000260038] [0x8a2d10 0x8a2e40] <nil> <nil>}

So it looks like the agents don't have the main executable: Start error fork/exec /data/4191823757/main: no such file or directory.

However if I check each agent, the executable does exist at that path:

/ # ls /data/4191823757
main

Am I missing something? Any suggestions would be welcome. Thank you for this awesome project. I hope I will be able to help out soon.

commented

I've figured out my issue, the pod with the word-count demo was not on an alpine container. I presume the build was incompatible. Using an alpine based pod worked like a charm :)