Terraform Iterative provider

The Iterative Provider is a Terraform plugin that enables full lifecycle management of cloud computing resources, including GPUs, from your favorite vendors. Two types of resources are available:

Runner (iterative_cml_runner)
Machine (iterative_machine)

The Provider is designed for benefits like:

Unified logging for workflows run in cloud resources
Automatic provision of cloud resources
Automatic unregister and removal of cloud resources (never forget to turn your GPU off again)
Arguments inherited from the GitHub/GitLab runner for ease of integration (name,labels,idle-timeout,repo,token, and driver)

Usage

Runner

A self hosted runner based on a thin wrapper over the GitLab and GitHub self-hosted runners, abstracting their functionality to a common specification that allows adjusting the main runner settings, like idle timeouts, or custom runner labels.

The runner resource also provides features like unified logging and automated cloud resource provisioning and management through various vendors.

Configuring the vendor credentials

This provider requires a repository token for registering and unregistering self-hosted runners during the cloud resource lifecycle. Depending on the platform you use, the instructions to get that token may vary; please refer to your platform documentation:

This token can be passed to the provider through the CML_TOKEN or environment variable, like in the following example:

export CML_TOKEN=···

Additionally, you need to provide credentials for the cloud provider where the computing resources should be allocated. Follow the steps below to get started.

Basic usage

- Setup your provider credentials as ENV variables

AWS

export AWS_SECRET_ACCESS_KEY=YOUR_KEY
export AWS_ACCESS_KEY_ID=YOUR_ID
export CML_TOKEN=YOUR_REPO_TOKEN

Azure

export AZURE_CLIENT_ID=YOUR_ID
export AZURE_CLIENT_SECRET=YOUR_SECRET
export AZURE_SUBSCRIPTION_ID=YOUR_SUBSCRIPTION_ID
export AZURE_TENANT_ID=YOUR_TENANT_ID
export CML_TOKEN=YOUR_REPO_TOKEN

Save your terraform file main.tf.

AWS

terraform {
  required_providers {
    iterative = {
      source = "iterative/iterative"
    }
  }
}

provider "iterative" {}

resource "iterative_machine" "machine" {
    repo = "https://github.com/iterative/cml"
    driver = "github"
    labels = "tf"

    cloud = "aws"
    region = "us-west"
    instance_type = "m"
    # Uncomment it if GPU is needed:
    # instance_gpu = "v100"
}

Azure

terraform {
  required_providers {
    iterative = {
      source = "iterative/iterative"
    }
  }
}

provider "iterative" {}

resource "iterative_machine" "machine" {
   repo = "https://github.com/iterative/cml"
    driver = "github"
    labels = "tf"

    cloud = "azure"
    region = "us-west"
    instance_type = "m"
    # Uncomment it if GPU is needed:
    # instance_gpu = "v100"
}

💡 Alternatively, you can use the JSON Terraform Configuration Syntax instead of the default HCL syntax.

Launch it!

terraform init
terraform apply --auto-approve

Argument reference

Variable	Values	Default
`driver`	`gitlab` `github`		The kind of runner that you are setting
`repo`			The Git repository to subscribe to.
`token`			A personal access token. In GitHub, your token must have Workflow and Repository permissions. If not specified, the Iterative Provider looks for the environmental variable CML_REPO
`labels`		`cml`	Your runner will listen for workflows tagged with this label. Ideal for assigning workflows to select runners.
`idle-timeout`		5min	The maximum time for the runner to wait for jobs. After timeout, the runner will unregister automatically from the repository and clean up all cloud resources. If set to `0`, the runner will never time out (be warned if you've got a cloud GPU).
`cloud`	`aws` `azure`		Sets cloud vendor.
`region`	`us-west` `us-east` `eu-west` `eu-north`	`us-west`	Sets the collocation region. AWS or Azure regions are also accepted.
`image`		`iterative-cml` in AWS `Canonical:UbuntuServer:18.04-LTS:latest` in Azure	Sets the image to be used. On AWS, the provider searches the cloud provider by image name (not by id), taking the lastest version if multiple images with the same name are found. Defaults to iterative-cml image. On Azure uses the form `Publisher:Offer:SKU:Version`
`spot`	boolean	false	If true, launch a spot instance
`spot_price`	float with 5 decimals at most	-1	Sets the maximum price that you are willing to pay by the hour. If not specified, the current spot bidding pricing will be used
`name`		iterative_{UID}	Sets the instance name and related resources based on that name. In Azure, groups everything under a resource group with that name.
`instance_hdd_size`		10	Sets the instance hard disk size in GB
`instance_type`	`m`, `l`, `xl`	`m`	Sets the instance CPU size. You can also specify vendor specific machines in AWS i.e. `t2.micro`. See equivalences table below.
`instance_gpu`	``, `tesla`, `k80`	``	Selects the desired GPU for supported `instance_types`.
`ssh_private`			An SSH private key in PEM format. If not provided, one private and public key wll be automatically generated and returned in `terraform.tfstate`

Machine

Setup instructions:

Setup your provider credentials as ENV variables

AWS

export AWS_SECRET_ACCESS_KEY=YOUR_KEY
export AWS_ACCESS_KEY_ID=YOUR_ID

Azure

export AZURE_CLIENT_ID=YOUR_ID
export AZURE_CLIENT_SECRET=YOUR_SECRET
export AZURE_SUBSCRIPTION_ID=YOUR_SUBSCRIPTION_ID
export AZURE_TENANT_ID=YOUR_TENANT_ID

Save your terraform file main.tf

AWS

terraform {
  required_providers {
    iterative = {
      source = "iterative/iterative"
    }
  }
}

provider "iterative" {}

resource "iterative_machine" "machine" {
  cloud = "aws"
  region = "us-west"
  name = "machine"
  instance_hdd_size = "10"
  instance_type = "m"
  # Uncomment it if GPU is needed:
  # instance_gpu = "v100"
}

Azure

terraform {
  required_providers {
    iterative = {
      source = "iterative/iterative"
    }
  }
}

provider "iterative" {}

resource "iterative_machine" "machine" {
  cloud = "azure"
  region = "us-west"
  name = "machine"
  instance_hdd_size = "10"
  instance_type = "m"
  ## Uncomment it if GPU is needed:
  # instance_gpu = "v100"
}

Launch your instance

terraform init
terraform apply --auto-approve

Stop the instance

Run to destroy your instance:

terraform destroy --auto-approve

Argument reference

Variable	Values	Default
`cloud`	`aws` `azure`		Sets cloud vendor.
`region`	`us-west` `us-east` `eu-west` `eu-north`	`us-west`	Sets the collocation region. AWS or Azure regions are also accepted.
`image`		`iterative-cml` in AWS `Canonical:UbuntuServer:18.04-LTS:latest` in Azure	Sets the image to be used. On AWS the provider does a search in the cloud provider by image name not by id, taking the lastest version in case there are many with the same name. Defaults to iterative-cml image. On Azure uses the form Publisher:Offer:SKU:Version
`name`		iterative_{UID}	Sets the instance name and related resources based on that name. In Azure, groups everything under a resource group with that name.
`spot`	boolean	false	If true launch a spot instance
`spot_price`	float with 5 decimals at most	-1	Sets the max price that you are willing to pay by the hour. If not specified, the current spot bidding price will be used.
`instance_hdd_size`		10	Sets the instance hard disk size in GB
`instance_type`	`m`, `l`, `xl`	`m`	Sets the instance CPU size. You can also specify vendor specific machines in AWS i.e. `t2.micro`. See equivalences table below.
`instance_gpu`	``, `tesla`, `k80`	``	Sets the desired GPU for supported `instance_types`.
`ssh_private`			SSH private key in PEM format. If not provided, one private and public key wll be automatically generated and returned in terraform.tfstate
`startup_script`			Startup script also known as userData on AWS and customData in Azure. It can be expressed as multiline text using TF heredoc syntax

Requirements

To be able to use instance_type and instance_gpu, you'll need access to launch instances from supported cloud vendors. Please ensure that you have sufficient quotas with your cloud provider for the instances you intend to provision with Iterative Provider. If you're just starting out with a new account with a vendor, we recommend trying Iterative Provider with approved instances, such as the t2.micro instance for AWS.

Example with native AWS instace type and region

terraform {
  required_providers {
    iterative = {
      source = "iterative/iterative"
    }
  }
}

provider "iterative" {}

resource "iterative_machine" "machine" {
  region = "us-west-1"
  image = "iterative-cml"
  name = "machine"
  instance_hdd_size = "10"
  instance_type = "t2.micro"
}

Supported vendors

The Iterative Provider currently supports AWS and Azure. Google Cloud Platform is not currently supported.

AWS instance equivalences

The instance type in AWS is calculated by joining the instance_type and instance_gpu values.

type	gpu	aws
m		m5.2xlarge
l		m5.8xlarge
xl		m5.16xlarge
m	k80	p2.xlarge
l	k80	p2.8xlarge
xl	k80	p2.16xlarge
m	v100	p3.xlarge
l	v100	p3.8xlarge
xl	v100	p3.16xlarge

region	aws
us-west	us-west-1
us-east	us-east-1
eu-north	us-north-1
eu-west	us-west-1

Azure instance equivalences

The instance type in Azure is calculated by joining the instance_type and instance_gpu

type	gpu	azure
m		Standard_F8s_v2
l		Standard_F32s_v2
xl		Standard_F64s_v2
m	k80	Standard_NC6
l	k80	Standard_NC12
xl	k80	Standard_NC24
m	v100	Standard_NC6s_v3
l	v100	Standard_NC12s_v3
xl	v100	Standard_NC24s_v3

region	azure
us-west	westus2
us-east	eastus
eu-north	northeurope
eu-west	westeurope

The iterative-cml image

We've created a GPU-ready image based on Ubuntu 18.04. It comes with the following stack already installed:

Nvidia drivers
Docker
Nvidia-docker

felipeweb / terraform-provider-iterative

Terraform Iterative provider

Usage

Runner

Configuring the vendor credentials

Basic usage

Argument reference

Machine

Argument reference

Requirements

Supported vendors

The iterative-cml image

About

Languages