[feature] Multi-CSP VPC & ASG bootstrapping
rmvangun opened this issue · comments
Problem Description
There is very little CSP-specific code that's needed in order to launch a production-ready Talos cluster on to any CSP. Why not handle that for everyone?
- Creates the VPC, ASG, and all necessary networking components according to best practices for security and availability
- Version the CSP layer and allow for one-click upgrades from Omni
- Allow for very basic options... Internet Gateway? NAT*? Zone selection? Some autoscaling options? Gah, the instance types 🤔
*Could you forward proxy via wireguard to Omni to effectively "airgap" the cluster in a VPC?
Solution
Maybe do it in Terraform and pay Hashicorp their license fees. Warning for drift and being able to version the Terraform is key I think. Open source the Terraform modules and let folks deploy it however they want, but permit that from Omni as well. It could be pretty minimal in the UI, with a terminal view (same look/feel as the log outputs already in Omni) to see the raw plan/apply if you care to look at it. Bootstrap the remote state using the CSP-native option (S3, etc.).
I believe all the major CSPs provide some OIDC option, so you can auth Omni to each CSP given a documented role/policy and you're off.
State resolution could be a pain if things go awry. I wonder if Terraform Stacks will make things smoother here, I haven't looked in to it much, but assuming it may handle multi-stage rollout / rollback more like Cloudformation Stacks so you'll have a higher degree of reliability in the deployment.
Alternative Solutions
- Use Terraform CDK vs. HCL?
- Use CSP native solutions (Cloudformation 🤮)
- Pulumi, Crossplane, etc... but honestly the history and staying power of Terraform feels nicer, and you don't need anything fancy, just get that VPC and ASGs rolled out
Notes
Even if there isn't a ton of code associated with each CSP, I still think it would be valuable to handle this from Omni. Multi-cloud means multi-know-how. That ability to go from not knowing a thing about a CSP to having a production ready k8s cluster seems actually possible.
The magic of Omni/Talos is great but there's all this dang setup before you can really deploy it anywhere. Ironically I have an easier time building bare metal clusters than I do cloud with Talos (tells ya sumthin).