Redis resources are not getting fully deleted when executed with a different user

Question

Redis resources are not getting fully deleted when executed with a different user

l-lafin opened this issue 3 years ago · comments

We got stuck on the following scenario while using the module:

We have been deploying our apps/services on CF through a deployment pipeline using the cf user 'X' (that is a superuser OrgManager, SpaceManager, SpaceDeveloper) then when we decided/needed to change the CF user to be 'Y' (that is also superuser), it has triggered terraform to recreate most of our stuff on CF (apps, networks, services, etc), but unfortunately, the resource cloudfoundry_service_key remained and the pipeline wasn't able to create a new credential because the name key already exists on the space.

leading us to two different discussions:

Perhaps we should use the same naming strategy from the app naming using the postfix parameter for cloudfoundry_service_key, to at least not get stuck on such cases.
Identify/Fix why the service_key doesn't get deleted in such a case.

Leandro Lafin · Answer 1 · Mon Oct 04 2021 20:28:08 GMT+0800 (China Standard Time)

Anything else to add @ScottGuymer?

Scott Guymer · Answer 2 · Mon Oct 04 2021 20:49:16 GMT+0800 (China Standard Time)

Do we think it only happens when the user deploying the terraform to CF changes?

I think it may be a provider issue that is making it think it is deleted and removing it from state.

Leandro Lafin · Answer 3 · Mon Oct 04 2021 20:55:01 GMT+0800 (China Standard Time)

Well as far as I remember this was the first time that our Redis instance needed to be recreated (I'm not entirely sure), but it should be easy to test, just need to deploy a Redis and force it to be recreated.

Andy Lo-A-Foe · Answer 4 · Tue Oct 05 2021 12:18:18 GMT+0800 (China Standard Time)

It's definitely a good idea to introduce some entropy in the key name, many services are simply named key right now, which is not good at all.

But are you saying the service key still existed, even though the underlying Redis was deleted?

It sounds more like a concurrency issue e.g. CF might have finished creating a New Redis instance while the old one was still in the process of being deleted?

Andy Lo-A-Foe · Answer 5 · Tue Oct 05 2021 12:41:51 GMT+0800 (China Standard Time)

I just did a quick test and luckily the key name is scoped to the instance itself, that still leaves the question why a second service key is being created for the same instance

Andy Lo-A-Foe · Answer 6 · Tue Oct 05 2021 13:34:33 GMT+0800 (China Standard Time)

@l-lafin @ScottGuymer one question, are you using CF service binding in that space?

Scott Guymer · Answer 7 · Tue Oct 05 2021 17:47:46 GMT+0800 (China Standard Time)

It looks like it is being bound..

https://github.com/philips-internal/terraform-api-gateway/blob/main/main.tf#L128

Leandro Lafin · Answer 8 · Tue Oct 05 2021 19:44:28 GMT+0800 (China Standard Time)

That's a good remark @loafoe, we are not binding it to our apps (explicitly) but we noticed when we deleted Redis/Apps manually that it was complaining about a bind, that we couldn't see (we inspect the apps in the CLI and there wasn't any bind).

Regarding the terraform-api-gateway shown by @ScottGuymer, we are not using the app app_authenticator_service (it's an optional app), so it shouldn't be bound to it or maybe there is some kind of glitch and the bind happen even without creating the app?

Leandro Lafin · Answer 9 · Tue Oct 05 2021 19:52:11 GMT+0800 (China Standard Time)

Do you remember the error message and the step that we did to reproduce it @jdelucaa?

Joana Deluca Kleis · Answer 10 · Tue Oct 05 2021 20:59:22 GMT+0800 (China Standard Time)

We got a weird situation where terraform detected changes made outside of Terraform and the changes included the deletion of the service key and changes in the exporter app:

Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the
last "*** apply":

  # module.incharge_environment_test.module.redis.cloudfoundry_app.exporter has been changed
  ~ resource "cloudfoundry_app" "exporter" {
      ~ environment          = (sensitive value)
        id                   = "08db1d89-6afd-40dd-a90a-de31f11452f6"
        name                 = "tf-redis-exporter-84ADF"
        # (17 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }
  # module.incharge_environment_test.module.redis.cloudfoundry_service_key.key has been deleted
  - resource "cloudfoundry_service_key" "key" {
      - credentials      = {
          - "hostname" = "redis-1ccc5c29.svc-2.na1.cluster.hsdp.io"
          - "password" = "D1oKvG9KU800cY06UjvfCbAW"
          - "port"     = "6379"
        } -> null
      - id               = "ccbc70ac-c87f-4c36-9cee-1427d71e1ec1" -> null
      - name             = "key" -> null
      - service_instance = "1ccc5c29-cd01-42a2-9a0a-4f9dc9ee7d70" -> null
    }

From there, it detected that it needed to update the app and recreate the key:

  # module.incharge_environment_test.module.redis.cloudfoundry_app.exporter will be updated in-place
  ~ resource "cloudfoundry_app" "exporter" {
      ~ environment          = (sensitive value)
        id                   = "08db1d89-6afd-40dd-a90a-de31f11452f6"
      ~ id_bg                = "08db1d89-6afd-40dd-a90a-de31f11452f6" -> (known after apply)
        name                 = "tf-redis-exporter-84ADF"
        # (16 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

  # module.incharge_environment_test.module.redis.cloudfoundry_service_key.key will be created
  + resource "cloudfoundry_service_key" "key" {
      + credentials      = (known after apply)
      + id               = (known after apply)
      + name             = "key"
      + service_instance = "1ccc5c29-cd01-42a2-9a0a-4f9dc9ee7d70"
    }

It all happened after the CF user password had expired and we had to change to a new user. The execution above failed because the new user did not have permission to perform the actions it needed for this APPLY:

Error: You are not authorized to perform the requested action

So we gave the new user the correct permissions and tried again. Then, we got the following error:

Error: The service key name is taken: key

  with module.incharge_environment_test.module.redis.cloudfoundry_service_key.key,
  on .***/modules/incharge_environment_test.redis/main.tf line 18, in resource "cloudfoundry_service_key" "key":
  18: resource "cloudfoundry_service_key" "key" {

We checked on CF and the key was actually there, so the error made sense. So we forced it to be deleted, we commented the code where we instantiate the Redis module out, but it failed with the following error:

Error: Please delete the service_bindings associations for your service_instances.

The key and the exporter app were gone, but we didn't find any service bindings on CF. We checked the routes and the apps.😟

It looks like an operational/provider issue since the key name is scoped to the service instance. We thought it had to be unique so we opened this issue to append some unique identifier to the name of the key.

To get the issue out of the way we destroyed the whole environment. 😄

dhavalshah02 · Answer 11 · Mon Oct 25 2021 18:15:52 GMT+0800 (China Standard Time)

There is another instance where this is appearing.

I am trying to create an ephemeral environment to test PICS. While destroying the environment I get the error
Error: Please delete the service_bindings, service_keys, and routes associations for your service_instances.

I do think this can be solved by introducing the parameter recursive_delete to resource "cloudfoundry_service_instance" "redis1" {} to the module.

The description of the parameter can be found here

I can have a go at implementing it.

Scott Guymer · Answer 12 · Mon Oct 25 2021 19:18:14 GMT+0800 (China Standard Time)

From what i have seen i think that the service key is scoped to the user AND the service, so when you change users doing the deploy the key will no longer be possible to interact with.

So you can have a key called key for the service but only the person who created it can see it?? maybe?

I think this coupled with some issue in the provider or API where it thinks it doesn't exists or has deleted it (probably because it cant see it in the API). Which then generates the issue when you try to re-create it.

Andy Lo-A-Foe · Answer 13 · Mon Oct 25 2021 20:57:48 GMT+0800 (China Standard Time)

AFAIK service keys are not user bound, but recursive_delete might be a good start. Is this easily reproducible BTW, or only happening under certain conditions (changing users, something else)?

Scott Guymer · Answer 14 · Mon Oct 25 2021 21:01:00 GMT+0800 (China Standard Time)

it seems to only be happening when switching users and triggering something that would force a re-create..

Maybe we could put together a repo?