This repository contains the code used to provision the infrastructure used to run tests and benchmarks against Nomad test clusters.
Nomad test clusters are a set of servers with hundreds or thousands of simulated nodes, which are
created using nomad-nodesim. The Nomad server processes are not simulated and are expected to
run on their own hosts, mimicking real world deployments. The Nomad servers are the focus of
benchmarking and load testing.
The nomad-bench repository contains a number of components that work together to create the
benchmarking and load testing environment.
The infra directory contains code which manages and handles deployed cloud environments, and is partitioned by AWS region.
The shared directory contains reusable Terraform modules, Ansible roles and collections, and Nomad job specifications.
The tools directory hosts our custom written Go tools which are aimed at running and recording benchmarking experiments. Please see the nomad-load readme and nomad-metrics readme files for more information on each tool and how to run it.
To run this project you needs the following tools installed in your machine:
The project also needs an AWS account where the infrastructure will be built and run. The resources used have a non-trivial monetary cost associated.
Virtual environments allow you to isolate Python package installation for specific projects.
Create and activate a virtual environment in the ./shared/ansible directory.
cd ./shared/ansible
python -m venv .venv
source .venv/bin/activate
cd ../../Run the make deps target from the root to install the dependencies.
make depsNavigate to the ./infra/eu-west-2/core directory and edit the empty variables within the
terraform.tfvars file to match you requirements and
environment setup.
Once customizations have been made, Terraform can be used to build the infrastructure resources.
terraform init
terraform plan
terraform apply --auto-approveOnce the infrastructure has been provisioned, you can extract the mTLS and SSH materials from the Terraform state. Following the command will detail which files are written to your local machine.
makeWith the infrastructure is provisioned, run Ansible to configure the base components. This includes Nomad.
cd ./ansible && ansible-playbook ./playbook.yaml && cd ..Since the cluster was just created, the Nomad ACL system must be bootstrapped. The result Nomad ACL
token is written to ./ansible/nomad-token.
cd ./ansible && ansible-playbook --tags bootstrap_acl ./playbook_server.yaml && cd ..The base infrastructure has been provisioned, now we need to configure some Nomad resources. From
the ./infra/eu-west-2/core directory, print the Terraform output and export the NOMAD_*
environment variables.
terraform output messageWe will also need to export the NOMAD_TOKEN environment variable using the bootstrap token which
can be found within ./ansible/nomad-token.
export NOMAD_TOKEN=e2d9d6e1-8158-0a74-7b09-ecdc23317c51Navigate to the core-nomad directory and run Terraform.
terraform init
terraform plan
terraform apply --auto-approve
Once completed the base nomad-bench infrastructure will be provisioned and running. This includes
InfluxDB which is exposed via the address which can be seen in the Terraform output. The password
for the admin user can be found via the Nomad UI variable section under the nomad/jobs/influxdb
path.
The infra directory contains a template that can be used to create the infrastructure for test cluster. This can simply be copied to generate the base configuration for a test cluster.
cp -r ./infra/eu-west-2/test-cluster-template ./infra/eu-west-2/test-cluster-<YOUR NAME>The newly created Terraform code requires a single variable to be set via a tfvars file. This can
be created using the commands below from inside the new directory.
cd ./infra/eu-west-2/test-cluster-<YOUR NAME>
cat <<EOF > terraform.tfvars
project_name = "test-cluster-<YOUR NAME>"
EOFThe test cluster definitions are stored within the main.tf file and should be customized before
Terraform is used for provisioning. The locals definition, is the most likely area where changes
will be made and serves as a place to add Ansible playbook variables. The below is an example of a
single test cluster, which is setting custom Ansible variables to modify the InfluxDB collection
interval and the Nomad agent in-memory telemetry interval and retention periods.
locals {
test_clusters = {
(var.project_name) = {
server_instance_type = "m5.large"
server_count = 3
ansible_server_group_vars = {
influxdb_telegraf_input_nomad_interval = "1s"
nomad_telemetry_in_memory_collection_interval = "1s"
nomad_telemetry_in_memory_retention_period = "1m"
}
}
}
}Once customizations have been made, Terraform can be used to build the infrastructure resources.
terraform init
terraform plan
terraform apply --auto-approveWith the base infrastructure built, we can configure the EC2 instances using Ansible. Customizations
to Ansible variables can also be made at this point, using the files within ./ansible/group_vars
and ./ansible/host_vars.
cd ./ansible && \
ansible-playbook ./playbook.yaml && \
cd ..The Ansible playbooks also support compiling, distributing, and running Nomad based on a local copy of the codebase. This must be run independently of the previous Ansible command, due to the tags used. It will copy your code to the bastion host and perform a remote compilation, before downloading the binary, distributing it to EC2 instances, and restarting the Nomad process.
cd ./ansible && \
ansible-playbook --tags custom_build --extra-vars build_nomad_local_code_path=<PATH_TO_CODE> playbook.yamlTerraform produces a number of Nomad job specification files which are designed to be run on the
core cluster, but interact with the test cluster. These are located within the jobs directory of
your test cluster infrastructure directory. When performing the job registration, you should ensure
the NOMAD_* environment variables are set and point to the core cluster.
The nomad-nodesim-<CLUSTER NAME>.nomad.hcl job utilises nomad-nodesim to register Nomad
clients with the test cluster servers. By default, it will register 100 clients that are spread
across two datacenters (dc1, dc2).
nomad job run \
-var 'group_num=1' \
./jobs/nomad-nodesim-<CLUSTER NAME>.nomad.hclThe nomad-load-<CLUSTER NAME>.nomad.hcl job utilises the nomad-load in order
to execute load against the Nomad cluster. The job specification should be modified in order to run
the load testing scenario you want. It also includes a Telegraf task that ships
telemetry data from the load testing tool, to the central InfluxDB server.
nomad job run \
./jobs/nomad-load-<CLUSTER NAME>.nomad.hclThe nomad-gc-<CLUSTER NAME>.nomad.hcl job can optionally be run to periodically trigger and run
the Nomad garbage collection process. This helps manage Nomad server memory utilisation in
situations where large number of jobs are being dispatched.
nomad job run \
-var 'gc_interval_seconds=60' \
./jobs/nomad-gc-<CLUSTER NAME>.nomad.hclOnce you have finished with the infrastructure, you should run terraform destroy in each
directory where terraform apply was previously run.