DAG loses concurrency with module chaining
kpenfound opened this issue · comments
When chaining module functions from the CLI, it appears that the expected DAG concurrency is lost. Here is an example:
Reproduced with Dagger v0.11.3
package main
import (
"context"
"fmt"
"math/rand"
)
type MyModule struct{}
type Job struct {
*Container
*Directory
Key string
}
type Jobs struct {
Jobs []Job
}
func (r *MyModule) JobGroup(
ctx context.Context,
// +optional
sync bool,
) Jobs {
jobs := Jobs{Jobs: []Job{
r.echoAndSleep(5),
r.echoAndSleep(6),
r.echoAndSleep(7),
}}
if sync {
jobs.Out().Sync(ctx)
}
return jobs
}
func (r *MyModule) Hack(ctx context.Context) *Directory {
jobs := r.JobGroup(ctx, false)
return jobs.Out()
}
func (r *MyModule) echoAndSleep(seconds int) Job {
forceRebuild := fmt.Sprint(rand.Int())
// forceRebuild = "no"
ctr := dag.Container().From("alpine").
WithExec([]string{"echo", forceRebuild}).
WithExec([]string{"sleep", fmt.Sprint(seconds)})
dir := ctr.Directory("/etc")
return Job{Container: ctr, Directory: dir, Key: fmt.Sprintf("/sleep-%v", seconds)}
}
func (r Jobs) Out() *Directory {
out := dag.Directory()
for _, job := range r.Jobs {
out = out.WithDirectory(job.Key, job.Directory)
}
return out
}
3 jobs run concurrently with:
dagger call job-group --sync out
dagger call hack
The 3 jobs are run serially with:
dagger call job-group out
The final command, job-group out
, is functionally identical to hack
, with the difference that the functions are chained by the CLI rather than in code
Very cool find! It's not the CLI though, it’s the engine:
dagger call hack
→query{hack{sync}}
dagger call job-group out
→query{jobGroup{out{sync}}}
Something’s happening in the jobGroup
step where all the ids are needed for deserialization but it seems it’s syncing too in the process. I wonder if the telemetry integration is forcing evaluation somehow. Something must be.
In hack
there's no serialization to pass the result from one function to the other.
You're right, something is forcing the sequential evaluation of everything.
If the second Function decides to evaluate only one object the others are still evaluated.
HackOne works as expected.
dagger call job-group one
-> 5.... 6.... 7....dagger call hack-one
-> 6....
func (jobs Jobs) One() *Container { return jobs.Jobs[1].Container }
func (r *MyModule) HackOne(ctx context.Context) *Container {
return r.JobGroup(ctx, false).One()
}
Changing the Directory field to a string fixes the issue for my usecase.
type Job struct {
*Container
OutputDir, Key string
}
Wrapping the directory in a container doesn't seem to help. (dag.Container().WithRootfs(dir)
)