[Bug]: vault_jwt_auth_backend not added to state if error during configuration
nmasur opened this issue · comments
Terraform Core Version
1.8.0
Terraform Vault Provider Version
3.23.0
Vault Server Version
1.15.4+ent
Affected Resource(s)
- vault_jwt_auth_backend
Expected Behavior
If there is an error configuring the backend during apply/creation (such as "error checking oidc discovery URL") then either one of the following should take place:
- The backend should remain and updated in state, while the apply is considered to have failed.
- Or the backend should be removed completely and still considered as failed to create.
Actual Behavior
If there is an error because the OIDC URL is unreachable (due to a firewall block, say), you get the following message:
Error: error updating configuration to Vault for path myjwtbackend: Error making API request.
Namespace: mynamespace
URL: PUT https://vault.mycorp.com/v1/auth/myjwtbackend/config
Code: 400. Errors:
* error checking oidc discovery URL
However, the resource is not cleaned up, nor is it added to state. This means that the Terraform provider has left the auth backend dangling on the Vault server. If you try to run it again, you'll now see this error:
* path is already in use at myjwtbackend/
This means that somebody has to go in to Vault and manually clean up the dropped resource. Ideally, this should be added to state even if it fails. Maybe it could instead be rolled back if necessary.
Relevant Error/Panic Output Snippet
vault_jwt_auth_backend.kubernetes["myjwtbackend"]: Creating...
╷
│ Error: error writing to Vault: Error making API request.
│
│ Namespace: mynamespace
│ URL: POST https://vault.mycorp.com/v1/sys/auth/myjwtbackend
│ Code: 400. Errors:
│
│ * path is already in use at myjwtbackend/
│
│ with vault_jwt_auth_backend.kubernetes["myjwtbackend"],
│ on policies_handler.tf line 315, in resource "vault_jwt_auth_backend" "kubernetes":
│ 315: resource "vault_jwt_auth_backend" "kubernetes" {
│
╵
Error: Terraform exited with code 1.
Error: Process completed with exit code 1.
Terraform Configuration Files
locals {
all_kubernetes_clusters = {
myjwtbackend = {
url = "https://some.invalid.url:6300"
}
}
}
resource "vault_jwt_auth_backend" "kubernetes" {
for_each = local.all_kubernetes_clusters
description = "Kubernetes cluster for ${each.key}"
path = each.key
oidc_discovery_url = each.value.url
bound_issuer = each.value.url
}
Steps to Reproduce
- Add a
vault_jwt_auth_backend
resource where the OIDC discovery URL is unreachable. - Run the Terraform apply to see the error in creating the resource.
- Run the Terraform apply again to now see "path is already in use" error.
Debug Output
No response
Panic Output
No response
Important Factoids
No response
References
No response
Would you like to implement a fix?
None
I'm also running into this, and it's proving very challenging to resolve, since the auth method backend mount exists in Vault but the configuration does not. My hope was to import the auth method and then mark it tainted, to force a replacement, but I am unable to import the created auth method, since it tries to read the auth method configuration during import.
This doesn't seem entirely like an issue with the Terraform provider; it seems like the underlying Vault API call is creating the auth mount successfully, and then failing to create the configuration but leaving the mount dangling. Fixing that would seem to be more invasive than updating the Terraform provider to consider the auth backend created but tainted.
I think the issue is that the Terraform provider has to make two underlying API calls:
- Enable the auth method (
/sys/auth/:path
). - Configure the auth method (
/auth/:path/config
)
When step 1 is successful but step 2 fails, the auth method from step 1 is still successful. In order for the provider to rollback step 1, it would need to delete the auth method (which it might not even have permission to do in some circumstances). It would be better to add it to state as either tainted or just complete with failures.
With that in mind, in order for you to be able to import it, the provider would need to separate out the auth method from its configuration. If the provider expected you to use the generic auth backend separately from the auth backend configuration, then that would follow API more accurately. It would also mean that every time you wanted an auth backend, you would need two resources, but it would be less confusing overall.
We're also running into this problem. It makes our Terraform configuration brittle, as we have to introduce checks before to make sure this resource creation will work, but we do not have everything under our control, so we might run into the error anyhow.
What would make the problem less severe for us, would be a timeout parameter and an automatic retry until the whole resource creation succeeds. But even in this case, if we reach the timeout, the backend should be added to the state, so I don't know.