dagger / dagger

An engine to run your pipelines in containers

Home Page:https://dagger.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DAG loses concurrency with module chaining

kpenfound opened this issue · comments

When chaining module functions from the CLI, it appears that the expected DAG concurrency is lost. Here is an example:

Reproduced with Dagger v0.11.3

package main

import (
	"context"
	"fmt"
	"math/rand"
)

type MyModule struct{}

type Job struct {
	*Container
	*Directory
	Key string
}

type Jobs struct {
	Jobs []Job
}

func (r *MyModule) JobGroup(
	ctx context.Context,
	// +optional
	sync bool,
) Jobs {
	jobs := Jobs{Jobs: []Job{
		r.echoAndSleep(5),
		r.echoAndSleep(6),
		r.echoAndSleep(7),
	}}
	if sync {
		jobs.Out().Sync(ctx)
	}
	return jobs
}

func (r *MyModule) Hack(ctx context.Context) *Directory {
	jobs := r.JobGroup(ctx, false)
	return jobs.Out()
}

func (r *MyModule) echoAndSleep(seconds int) Job {
	forceRebuild := fmt.Sprint(rand.Int())
	// forceRebuild = "no"
	ctr := dag.Container().From("alpine").
		WithExec([]string{"echo", forceRebuild}).
		WithExec([]string{"sleep", fmt.Sprint(seconds)})
	dir := ctr.Directory("/etc")
	return Job{Container: ctr, Directory: dir, Key: fmt.Sprintf("/sleep-%v", seconds)}
}

func (r Jobs) Out() *Directory {
	out := dag.Directory()
	for _, job := range r.Jobs {
		out = out.WithDirectory(job.Key, job.Directory)
	}
	return out
}

3 jobs run concurrently with:
dagger call job-group --sync out
dagger call hack

The 3 jobs are run serially with:
dagger call job-group out

The final command, job-group out, is functionally identical to hack, with the difference that the functions are chained by the CLI rather than in code

Very cool find! It's not the CLI though, it’s the engine:

  • dagger call hackquery{hack{sync}}
  • dagger call job-group outquery{jobGroup{out{sync}}}

Something’s happening in the jobGroup step where all the ids are needed for deserialization but it seems it’s syncing too in the process. I wonder if the telemetry integration is forcing evaluation somehow. Something must be.

In hack there's no serialization to pass the result from one function to the other.

You're right, something is forcing the sequential evaluation of everything.
If the second Function decides to evaluate only one object the others are still evaluated.
HackOne works as expected.

  • dagger call job-group one -> 5.... 6.... 7....
  • dagger call hack-one -> 6....
func (jobs Jobs) One() *Container { return jobs.Jobs[1].Container }
func (r *MyModule) HackOne(ctx context.Context) *Container {
	return r.JobGroup(ctx, false).One()
}

Changing the Directory field to a string fixes the issue for my usecase.

type Job struct {
	*Container
	OutputDir, Key string
}

Wrapping the directory in a container doesn't seem to help. (dag.Container().WithRootfs(dir))