3h4x / t0rn-collator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

t0rn Collator Infrastructure

Architecture

Key components:

  • VPC with private and public subnet
  • t0rn-collator in private subnet and NatGW in public
  • ASG with min, max instances set to 1 for t0rn-collator
  • Configuration with userdata

Atmos was used to provide a DRY solution

The atmos CLI is a universal tool for DevOps and cloud automation. It allows deploying and destroying Terraform and helmfile components, as well as running workflows to bootstrap or teardown all resources in an account.

atmos includes workflows for dealing with:

  • Provision large, multi-accountTerraform environments
  • Deploy helm charts to Kubernetes clusters with helmfile
  • Execute helm commands on Kubernetes clusters
  • Executing kubectl commands on Kubernetes clusters

It's a very good and versitile tool.

Note: More documentation

Installing Atmos

  • macosx brew install atmos
  • linux

For more options go to docs

Basic Architecture

Note: More documentation

TLDR Bootstrapping infrastructure

Workflow t0rn.yaml will provision two components that are essentials of this work, vpn and t0rn-collator.

atmos workflow -f t0rn.yaml plan

If plan looks good (obviously planning t0rn-collator will fail without VPC) then:

atmos workflow -f t0rn.yaml apply

When the time comes:

atmos workflow -f t0rn.yaml destroy

Important choices

  • atmos makes IaC DRY and very easy to manage and extend
  • terraform state is local to not introduce additional complexity
  • t0rn-collator image was rebuilt to mitigate issues with entrypoint and libssl library (https://hub.docker.com/repository/docker/3h4xx/t0rn-collator)
  • t0rn-collator image is ran directly by docker with restart policy, restart policy is not fool proof, it's possible it can fail to restart container under certain circumstances (but I have tested most common case of server restart and container is running fine afterwards)
  • t0rn-collator is deployed in private network with empty SG ingress, outbound taffic goes via NatGW
  • eu-central-1 was picked as deployment region close to other nodes, which should provide lower latency in sync
  • t3a-medium is an arbitrary choice which is just to limit costs of development

Security and SRE TODO

  • AWS multi account where each environment has it's own account
  • VPC Flow logs enabled
  • deployment should be in multiple subnets within AZs (not done due to cost of NATGW), otherwise it's not fault tolerant
  • container orchestration should be done
  • monitoring should be added as collator have prometheus exporter binded
  • each provisioned t0rn-collator is starting blockchain synchronization from scratch, this can be improved by using baked AMI
  • CICD for validation, linting and deployment

About


Languages

Language:HCL 97.7%Language:Shell 2.3%