gocd / gocd

GoCD - Continuous Delivery server main repository

Home Page:https://www.gocd.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Proposal: New BuildCommand based Server-Agent Communication

wpc opened this issue · comments

What this is about

Currently interface between server and agent are based on serialization on a big graph of java objects, rooted by BuildWork. There is quite a lot cons on this approach, and the majority one IMHO there is no clear boundary between server and agent, making agent implementation tightly coupled with Java platform and complex domain models on go server side.

During @xli's work on #1793, we found we actually can give it a very clean abstraction cut. We propose a BuildCommand based API between server and agent.

BuildCommand

Instead of sending BuildWork from server to agent. We convert BuildWork to a tree of composed commands and send to agent. The responsibility of agent becomes to just processing commands:

buildcommand api

BuildCommand is made from 5 elements:

  • Name: name of the command
  • Arguments [a list of String or Map]: arguments for the command, each command knows how to read
    and use
  • WorkingDirectory [String]: a relative path to agent root directory
    for setting up command's working directory
  • Test [BuildCommand, expectation]: command precondition, run command only if test result is
    expected. Default to none.
  • RunIfConfig [String: "passed", "failed", "any"]: same concept with
    Go's runIfConfig configuration, run command based on current build status (pass/fail). "passed" is default.

Atomic BuildCommand List

Basing on our spike, we found the following command list is enough to
do all the work current Go support:

  • start: prepare environment for running a new list of build commands
  • compose: compose a list of commands. Agent expands it and process
    each sub-commands.
  • echo: output a string to console log that end user will see.
  • export: setup environment variables for the rest of following
    commands. If no argument provide, it dumps all environment variables to the console.
  • exec: execute a external (bash/shell) command.
  • test: test file/directory existence. Same with Unix test command.
  • generateProperty: extract property and value, upload to server.
  • downloadFile: download a file.
  • downloadDir: download a directory from server.
  • uploadArtifact: upload artifact to server.
  • generateTestReport: generate test report and upload to server.
  • callExtension: call JSON based plugin extension, including scm and task
    plugins.
  • callAPIBasedTaskExtension: call API based task extension.
  • end: end of a build, agent will clean up build environment and
    change its status back to idle.
  • reportCurrentStatus: report current build status to server.
  • reportCompleting: report completing status to server.
  • reportCompleted: report build completed to server.

See
https://github.com/wpc/gocd/blob/lightweight-agent-spike/agent/src/com/thoughtworks/go/agent/BuildSession.java
for all commands processing details.

Combination of BuildCommand

  • BuildCommand#Test uses another BuildCommand to do pre-condition check
    for a command, any of existing commands can be used. For example in the GitMaterial we want clone the repository only if it .git directory not exists:
BuildCommand clone = new BuildCommand('exec', 'git', "clone", "--depth=2", "-n", "--branch=" + branch, url.forCommandline(), destDir.getPath());
clone.setTest(new BuildCommand("test", "-d", new File(destDir, ".git").getPath()), false);
commands.add(clone);
  • "compose" command is used to create a group of commands to execute.
  • "runIfConfig" is used for compose commands basing on build running status which we don't know when we are composing BuildCommand on server side. For example, we can output current build status like the following code:
commands.add(new BuildCommand("echo", "Current job status: passed")); // runIf is default to passed
commands.add(new BuildCommand("echo", "Current job status: failed").runIf("failed"));

Example of composed BuildCommand for a BuildWork

compose
  start "{buildLocator=upto42/12/defaultStage/1/defaultJob, propertyBaseUrl=https://localhost:8154/go/remoting/properties/upto42/12/defaultStage/1/defaultJob/, artifactUploadBaseUrl=https://localhost:8154/go/remoting/files/upto42/12/defaultStage/1/defaultJob/, buildId=377, buildLocatorForDisplay=upto42/12/defaultStage/1/defaultJob, consoleURI=https://localhost:8154/go/remoting/files/upto42/12/defaultStage/1/defaultJob/cruise-output/console.log?attempt=1&buildId=377}"
  export "{GO_PIPELINE_NAME=upto42, GO_FROM_REVISION_GITHUB=e18d8d7a15b15cec4fbd9c05e40c6d684ac0a766, GO_REVISION_GITHUB=e18d8d7a15b15cec4fbd9c05e40c6d684ac0a766, GO_SERVER_URL=https://localhost:8154/go, GO_STAGE_COUNTER=1, GO_PIPELINE_COUNTER=12, GO_TO_REVISION_GITHUB=e18d8d7a15b15cec4fbd9c05e40c6d684ac0a766, GO_PIPELINE_LABEL=12, GO_JOB_NAME=defaultJob, GO_STAGE_NAME=defaultStage, GO_TRIGGER_USER=admin}"
  echo "Job started."
  compose
    echo "Start to prepare"
    reportCurrentStatus "Preparing"
    export
    echo "Start to update materials."
    compose
      echo "[go] Start updating files at revision StringRevision[e18d8d7a15b15cec4fbd9c05e40c6d684ac0a766] from file:///tmp/wpc.github.io"
      exec "git" "clone" "--depth=2" "-n" "--branch=master" "file:///tmp/wpc.github.io" "pipelines/upto42"
      echo "[GIT] Fetch and reset in working directory pipelines/upto42"
      echo "[GIT] Cleaning all unversioned files in working copy"
      exec "git" "clean" "-df"
      echo "[GIT] Fetching changes"
      exec "git" "fetch" "origin"
      echo "[GIT] Performing git gc"
      exec "git" "gc" "--auto"
      echo "[GIT] Updating working copy to revision e18d8d7a15b15cec4fbd9c05e40c6d684ac0a766"
      exec "git" "reset" "--hard" "e18d8d7a15b15cec4fbd9c05e40c6d684ac0a766"
  compose
    echo "Start to build"
    reportCurrentStatus "Building"
    compose
      exec "ls"
    reportCompleting (runIf:any)
    echo "Current job status: passed"
    echo "Current job status: failed" (runIf:failed)
    reportCurrentStatus "Completing" (runIf:any)
    echo "Start to create properties" (runIf:any)
    echo "Start to upload" (runIf:any)
    uploadArtifact "Gemfile.lock" "" (runIf:any)
    uploadArtifact "about.html" "about" (runIf:any)
    uploadArtifact "testreports" "" (runIf:any)
    uploadArtifact "about.html" "" (runIf:any)
    generateTestReport "testreports" (runIf:any)
  reportCompleted (runIf:any)
  end (runIf:any)

Benefits

  • Simplified server-agent contract, because less knowledge is shared between server and agent. Agent only need to know how to execute BuildCommand. No job, stage, pipeline, material and other cruise-config concept. We do not need bother to serializing and passing them back and forth anymore.
  • The simplified contract makes implement go-agent in a different language possible. For container based build such as docker, java agent use far more memory than what really needed. We plan to have a spike on building a go-lang agent. (We may call it "go-go-agent" :-D)
  • The intermediate abstraction layer open up chances for easy build execution optimization. Things like parallel artifacts fetching and downloading become low hanging fruits.
  • Open possibilities for simplifying agent related plugin. We can just ask SCM and Task plugin composing a BuildCommand back and send to agent along with other build commands. No need for deploy plugin into agent anymore. Also it can save a lot of efforts of plugin author on common tasks like robust http calls.
  • It's possible to remove existing task extensions, instead, we let plugin define new commands to extend existing command list.

The Spike

We have explored most places need be changed in a spike (https://github.com/wpc/gocd/tree/lightweight-agent-spike). Cancel build and secure environment is not included because they are pretty straightforward for the new model.

The diff: 7f56a07...wpc:lightweight-agent-spike

Thoughts? Feedback?

-- @xli & @wpc

@wpc and I discussed the following implementation strategies. We'd like to hear your thoughts:

1. Development under a toggle, merge with toggled off after all development work is done

This strategy is similar approach with what I did in websocket communication. We will have a branch to implement everything (or most important parts).
Then we will send pull request and ask for review and merge back to trunk with toggle off by default.
All of changes will be hidden to trunk builds.

We may need help to setup a build pipeline for running tests with toggle on, so that we can fix tests faster.

This approach will have large changesets to be reviewed and merged.

2. Reverse the order of implementation, switch existing implementation with BuildCommand piece by piece with controlled scope (for example, start with GitMaterial).

This approach is trying to carefully find out what we can change to BuildCommand mode, and limit the change to a controllable scope. So that changes can be merged back trunk with high confidence (without toggle), and all existing test will cover changes.

For example, the first pull request maybe change GitMaterial#updateTo method. It will be replace with something like:

cmd = createBuildCommand
BuildCommand.process(cmd)

While this pull request is in review or merged, we will start work on another material.
After we converted all materials, we will start to work on Builders. Then BuildWork#prepareJob. Depending on our progress, we will find out a reasonable size of part code in BuildWork.

This approach will have more work to find out how we implement BuildCommand step by step, and more pull requests need to be reviewed and merged.
But once we done with implement, it's done, all tests will cover our changes.

Will this mean all future server to agent communication can be under tls?

@fire:
Current server-agent communication is already under tls, except the initial agent registration call. I am concerning about that as well, but likely that should be a separate issue. This proposal is more about API protocols between server and agent, and less about transportation.

-- wpc

Closing. This happened and is merged in. Trying to make it the default and removing the old way, still, though.