Problem statement: You need to create and manage cloud infrastructure
landscapes across many different AWS account/region combinations targeting
different logical environments (dev
, prod
, etc.).
How can a single, Terraform project be used across all necessary account, region, and environment combinations? How can the IaC be modeled to enforce security best practices, uniformity, and logically isolated failure domains, while also accommodating intentional heterogeneity?
Solution: In my experience, Terraform's workspace feature -- used in concert with a compound ${AWS_ACCOUNT_ID}_${AWS_REGION}_${ENV}
-based workspace naming convention -- enables scalable, DRY re-use patterns, and logical infrastructure segmentation, reducing toil and lead time.
tf-workspaces-demo
offers a reference implementation.
- See GitHub Actions run 7299615362 as an example invocation of its CI/CD workflow against
main
. - See GitHub Actions run 7287133745 as an example invocation of its CI/CD workflow against a pull request.
- Use a
${AWS_ACCOUNT_ID}_${AWS_REGION}_${ENV}
-compound workspace naming scheme to logically segment Terraform operations (and state) across AWS account/region/environment boundaries, ensuring infrastructure redundancy across sufficiently limited failure domains. - Ensure uniformity across workspaces, while also accommodating intentional per-workspace (or per-account, per-region, or per-environment) heterogeneity if/where needed.
- Enable the low-friction creation of new infrastructure in new
account/region/environment combinations by adding a single workspace entry to
workspaces.json
. - Dynamically drive the creation of GitHub Actions matrix builds to
plan
/apply
, Terraforom, ensuring CI/CD automation elastically scales and contracts as workspaces are created and/or decommissioned. - Use the
terraform.workspace
to impose an allowed_account_ids constraint on the AWS provider, such that an environment is neverplan
/apply
'd to the wrong account. - Bonus: leverage Terraform workspaces to dynamically create ephemeral pull-request-based
development and testing environments.
- See PR 16 and its associated GitHub Actions workflow as an example.
- See PR 21 and its environment's destruction as an example of the automated destruction of an ephemeral environment after a PR is closed or merged.
Disclaimers
- It's often useful to subdivide IaC across responsibility-based projects,
each serving a different "layer" of infrastructure purpose (vs. problematically large, sprawling
"monolithic" Terraform projects). For example, foundational infrastructure,
such as VPC and networking configuration, may be managed in separate Terraform project(s)
than higher level platform infrastructure, such as Kubernetes clusters.
tf-workspaces-demo
glosses over this, focusing instead on the effective use of Terraform workspace conventions within projects. Effective modeling of responsibility layers across distinct Terraform projects is a separate art altogether ;) - To decouple the demo from real-world AWS dependencies,
tf-workspaces-demo
uses localstack-persist as a local, mocked AWS. No real AWS resources are created; instead, the demo focuses on illustrating high level Terraform/AWS patterns that are largely agnostic to the specific AWS resources under management. - The use of
localstack-persist
-- and the demo's need to persistlocalstack
data across GitHub Actions jobs -- requires lotsa extra GitHub Actions workflow monkey business that wouldn't appear in a real world workflow targeting a real cloud provider. Try not to be too distracted by that :) tf-workspaces-demo
's GitHub Actions workflow is not intended as the canonical design universally applicable to all projects and contexts. Depending on needs, it may be attractive to structure a project's CI/CD differently. For example, there could be distinct jobs -- or even separate workflows, entirely -- targetingdev
andprod
(each composed of per-workspace parallelized matrix builds), such that CI/CD parallelizes operations within the same environment, while still ensuring per-workspace Terraform operations againstprod
hinge ondev
operations' success. Additionally, the workflow(s) could be enhanced with additional steps and fanciness: terratest tests, OPX automated plan analysis, automated pull request commenting reportingplan
output, etc.
While only peripherally relevant to the core problem statement, tf-workspaces-demo
demos some other fun stuff too.
- On pull requests,
plan
/apply
to an ephemeral pull-request workspace; destroy that workspace if/when the pull request is closed or merged (and automate the creation of PR comments announcing these actions) (This also demonstrates how the${AWS_ACCOUNT_ID}_${AWS_REGION}_${ENV}
workspace naming scheme accommodates additional, increasingly granular suffixes if/where needed, like${AWS_ACCOUNT_ID}_${AWS_REGION}_${ENV}_pr-${PULL_REQUEST_ID}
). - localstack-persist is used to
create a local mock AWS, mostly to decouple
tf-workspace-demo
from real AWS dependencies, while still illustrating some AWS/Terraform design patterns. Zooming out, though,localstack
is useful for demos liketf-workspaces-demo
, but also useful in development and testing real Terraform projects and modules, depending on context. tf-workspaces-demo
uses actions/upload-artifact in a kinda-fun-but-maybe-hacky way to persistlocalstack-persist
spanning multiple GitHub Actions jobs. This is a bit unusual; try not to be too too distracted.- By imposing a
strategy.max-parallel: 1
on the GitHub Actions matrix build, Terraform actions are invoked serially against each workspace in the order in which workspaces are listed inworkspaces.json
. This means an error applying to adev
workspace fails the build before any Terraform action is taken againstprod
workspaces. tf-workspaces-demo
'sdocker-compose.yaml
shows how to establish alocalstack
-based local AWS environment, pre-seeded with an S3 bucket for use persisting Terraform state to S3, as well as a DynamoDB table for use enforcing Terraform state locking
See the source code comments for particular details and relevant callouts.
-
What about terragrunt?
In many contexts, Terragrunt (and similar tools) are great. However, their use invites additional complexity (and additional questions about how best to structure IaC across account, region, and environment boundaries). Often, in my experience, Terraform workspaces are sufficient.
-
Don't Terraform child modules enable DRY reuse?
Generally, Terraform child modules and workspaces address slightly different problems and are not mutually exclusive. While workspaces facilitate the application of a Terraform project against multiple target contexts, provider configurations, and against isolated states, child modules are more simply generic abstractions of opinionated Terraform "recipes." Modules often target specific resources (or combinations of resources), but are largely agnostic to the surrounding context. Child modules can be used and applied within parent Terraform projects, though they cannot be applied independently; they have no project-specific state and provider configuration. As such, child modules enable reuse and composability -- and/or enforce best practices governance -- along different dimensions of concern.
-
Couldn't
region
be an input variable?Rather than being encoded in the compound workspace naming convention, the Terraform project could utilize a
var.region
input variable, yes. However, this would lead to two problems:- Workspace naming collision. For example,
123_dev
'sus-east-1
and123_dev
'sus-west-2
deployments would no longer have unique workspace names. - Workspace S3 state collision. For example, Terraform would attempt to use
s3://${BUCKET}/env:/123_dev/terraform.tfstate
for both123_dev
'sus-east-1
and123_dev
'sus-west-2
applications.
- Workspace naming collision. For example,