pulumi / pulumi-kubernetes

A Pulumi resource provider for Kubernetes to manage API resources and workloads in running clusters

Home Page:https://www.pulumi.com/docs/reference/clouds/kubernetes/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Named resources get deleted from cluster after pulumi resource name change, even though indicated otherwise by pulumi

sfc-gh-jlangefeld opened this issue · comments

What happened?

When changing the pulumi resource name of a kubernetes resource that was named via

	Metadata: metav1.ObjectMetaArgs{
		Name: pulumi.String("test"),
	},

and then running pulumi up, pulumi indicates that the resource gets deleted and created (as expected), but in reality the resource only gets deleted.

This is a risk for us in the case of resource name changes. Especially that the preview and up indicate that the resource should be created again, but it never was in reality.

We rely on resource naming via metadata.name because that's the only way to adopt existing resource, a common use case for us.

Example

Here are some good steps to reproduce and to exemplify the behavior:

  1. Create kind cluster via kind create cluster
  2. Create the quickstart project as described in https://www.pulumi.com/docs/clouds/kubernetes/get-started/create-project/
  3. Add a Kubernetes metadata.name in the appsv1.DeploymentArgs struct via:
    Metadata: metav1.ObjectMetaArgs{
    	Name: pulumi.String("test"),
    },
  4. Run pulumi up and make sure the deployment was created:
    ╰─ kubectl get deploy test
    NAME   READY   UP-TO-DATE   AVAILABLE   AGE
    test   1/1     1            1           10s
  5. Change the name of the pulumi resource to something new:
    -deployment, err := appsv1.NewDeployment(ctx, "app-dep", &appsv1.DeploymentArgs{
    +deployment, err := appsv1.NewDeployment(ctx, "app-dep1", &appsv1.DeploymentArgs{
  6. Run pulumi preview and it shows correctly that it wants to delete test and create test1:
    ╰─ pulumi preview
    Previewing update (dev)
    
    View in Browser (Ctrl+O): https://app.pulumi.com/jonny-langefeld/quickstart3/dev/previews/eb071e68-ace0-4ce3-a36e-fecf4f91f419
    
         Type                              Name             Plan
         pulumi:pulumi:Stack               quickstart3-dev
     +   ├─ kubernetes:apps/v1:Deployment  app-dep1         create
     -   └─ kubernetes:apps/v1:Deployment  app-dep          delete
    
    Resources:
        + 1 to create
        - 1 to delete
        2 changes. 1 unchanged
    
  7. Run pulumi up and it will show you that it actually deleted test and created test1:
    Updating (dev)
    
    View in Browser (Ctrl+O): https://app.pulumi.com/jonny-langefeld/quickstart3/dev/updates/11
    
         Type                              Name             Status
         pulumi:pulumi:Stack               quickstart3-dev
     +   ├─ kubernetes:apps/v1:Deployment  app-dep1         created (0.25s)
     -   └─ kubernetes:apps/v1:Deployment  app-dep          deleted (1s)
    
    Outputs:
        name: "test"
    
    Resources:
        + 1 created
        - 1 deleted
        2 changes. 1 unchanged
    
    Duration: 5s
    
  8. check the resource on kubernetes and it shows that it's not there anymore (note that the kubernetes resource name should still be test because that wasn't changed):
    ╰─ kubectl get deploy test
    Error from server (NotFound): deployments.apps "test" not found
    

Output of pulumi about

╰─ pulumi about
CLI
Version      3.107.0
Go Version   go1.22.0
Go Compiler  gc

Plugins
NAME        VERSION
go          unknown
kubernetes  4.10.0
kubernetes  3.30.2

Host
OS       darwin
Version  14.4.1
Arch     arm64

This project is written in go: executable='/opt/homebrew/bin/go' version='go version go1.22.2 darwin/arm64'

Current Stack: jonny-langefeld/quickstart3/dev

TYPE                          URN
pulumi:pulumi:Stack           urn:pulumi:dev::quickstart3::pulumi:pulumi:Stack::quickstart3-dev
pulumi:providers:kubernetes   urn:pulumi:dev::quickstart3::pulumi:providers:kubernetes::default
kubernetes:core/v1:ConfigMap  urn:pulumi:dev::quickstart3::kubernetes:core/v1:ConfigMap::test2


Found no pending operations associated with dev

Backend
Name           pulumi.com
URL            https://app.pulumi.com/jonny-langefeld
User           jonny-langefeld
Organizations  jonny-langefeld, snowflake
Token type     personal

Dependencies:
NAME                                        VERSION
github.com/pulumi/pulumi-kubernetes/sdk/v4  v4.10.0
github.com/pulumi/pulumi/sdk/v3             v3.112.0

Pulumi locates its logs in /var/folders/nj/gdkbhsx936989xh4_xt0gm400000gn/T/ by default

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

Also able to replicate the issue based on the repro steps. It appears to be planning/adding based on the id, but then deleting based on Spec.Metadata.Name, causing and add / delete instead of a replace. The add upserts, giving the deployment a new Id, same name, and the delete then removes the freshly upserted deployment.

I also note, if I create two deployments with the same Metadata.Name, but different ID, Pulumi will register that it has created 2 deployment resources, but only 1 deployment exists in the cluster. It will create 1 and then immediately upsert on it (replacing the first ID), while tracking 2 in state.

Thank you for bringing this issue to our attention, @sfc-gh-jlangefeld. Currently, the behavior you're experiencing is a deliberate aspect of how server-side-apply functions within the Pulumi Kubernetes provider which unfortunately results in this scenario. When utilizing server-side-apply (SSA) and creating a new Pulumi resource, the provider performs an upsert operation on the corresponding Kubernetes object. Consequently, during a pulumi up operation, if the object already exists in the cluster, instead of encountering a "resource already exists" error, the provider updates the existing object to align with the desired specification outlined in the Pulumi program. This particular behavior can lead to situations like the one you've encountered - which is not ideal.

We're actively tracking this behavior in another open issue (#2926), which addresses the desire to alter this upsert behavior to have the provider consistently fail during creation if the object already exists in the cluster. It's worth noting that even if this behavior is modified for SSA, renaming Pulumi resources could still present challenges. In such cases, rather than the object being lost in the cluster, an error would be triggered indicating that the object already exists.

For renaming Pulumi resources, we recommend utilizing a resource alias. Further guidance on this can be found in our documentation: Link to Documentation

I will be closing this issue in favor of #2926 which tracks the same underlying behaviour with SSA.

Hi @rquitales unfortunately #2926 doesn't present a solution for this issue. We actually specifically rely on the SSA upsert behavior by setting metadata.name as mentioned in the issue description. We are trying to use pulumi to adopt 1000s of existing Kubernetes resources that were previously created via other means. We cannot recreate all these resources as that would mean a short downtime for many production resources. So we would not want to fail in this case as desired in #2926.

Given that we specifically rely on the SSA upsert behavior I think we should re-open this issue.

Acceptable solutions could include the following:

  • pulumi fails upon a pulumi name re-name
  • reorder the create and delete of the renamed resource, so that the delete always happens and completes before the create
  • offer a protection machanism that prevents pulumi resource name re-naming (pulumi protect wouldn't work here as there are legitimate reasons wanting to delete a resource. So it would need to be specific to the re-naming case)

I think the main issue right now is that both, pulumi preview and pulumi up evidently show the wrong thing by saying it deleted and created the resource, but in reality it only deleted it, which could be fatal.

Do you think that's possible?

@sfc-gh-jlangefeld I agree with your statement that the current different results of pulumi preview and pulumi up is an issue that needs to be resolved and fixed so that users do not end up with a deleted Kubernetes object, while Pulumi still believes that the object exists.

pulumi fails upon a pulumi name re-name

This is a change that would need to be done in the Pulumi engine and I'm not sure if it's technically possible since the engine would see a create event and a delete event. I don't think it's capable of knowing that these two events are due to a renaming event.

reorder the create and delete of the renamed resource, so that the delete always happens and completes before the create

There is ongoing discussions about delete before create in upstream pulumi/pulumi about this, ref: pulumi/pulumi#2877

offer a protection machanism that prevents pulumi resource name re-naming (pulumi protect wouldn't work here as there are legitimate reasons wanting to delete a resource. So it would need to be specific to the re-naming case)

This should occur once we land #2926, as renaming the pulumi resource itself will result in a resource already exists error. This then allows the user to use resource aliasing to rename their pulumi resources should they really need to rename them.

We actually specifically rely on the SSA upsert behavior by setting...

To address your concerns about this, we're investigating also allowing the current upsert behaviour once #2926 is landed via an opt-in resource option. This would allow users who are currently relying on this behaviour to adopt resources to not be blocked.

I think in your case @sfc-gh-jlangefeld you would want three things:

  1. For resources you are intentionally adopting into Pulumi, use the current default or potential future opt-in to allow upsert of an existing resource (defined either outside of Pulumi, in another Pulumi program, or in the same Pulumi program).
  2. If you are ever doing a purely logical rename of a Pulumi resource, make sure to apply an alias so that Pulumi understands this is the same resource and should be updated instead of seperately created and deleted.
  3. Optionally - apply the delete-before-replace resource option to these resources so that in case of need for replacement, they are first deleted and then re-created, avoiding accidentally deleting the "new" (not actually new do to upsert) resource.

The most important part of this is (2). This looks like:

}, pulumi.Aliases([]pulumi.Alias{{Name: pulumi.String("app-dep")}}))

This ensures that Pulumi knows that app-dep and app-dep1 are "the same thing", and should be an update instead of an unrelated create and delete of two different unrelated resources.

Then these are a create and delete of unrelated resources, Pulumi will do the create operation first, and the delete at the end of the update during it's cleanup phase. That means that the "new" resources will be created first, but there is no new resource because it is upserted, so both the new and old logical Pulumi resources are actually referring to the same physical resources. When Pulumi then deletes the "old" logical resource, it deletes the one and only physical resource.

The way to avoid this ever happening - while still supporting upsert - is to do (2) and (3) above. I believe with those applied, you should get all of the expected results here - while still being able to get upsert behaviour on your resources.

Thank you for your replies and thoughts here!
I understand that (2) and (3) are a way to intentionally tell Pulumi that the resource name changed. However we were more concerned about the unintentional case and even more so about an implicit case. Maybe the resource name is the result of some other functions and a future engineer changes those (as opposed to directly changing some resource name).
The CI running a preview would maybe show them that some resources would get deleted, but also created. Even the pulumi up would say the same thing. But then the reality of course is that the resource is actually deleted, which can easily cause an outage.

We would like to either prevent this in code, or visualize it via the preview output, that after the run the resource is actually deleted.

@rquitales summarized it nicely with this:

the current different results of pulumi preview and pulumi up is an issue that needs to be resolved and fixed so that users do not end up with a deleted Kubernetes object, while Pulumi still believes that the object exists.

Regarding the 'failure on upsert' as described in #2926, that would actually be undesired behavior for us and we would be a candidate for the opt-in to still allow upserts. All we care about is that the state of the real world is as it should be. If that means that a resource already exists we don't want to fail, but we want to accept it as the new truth and the program should be a success because the desired state matched the real world state. This is also our reason for why we'd like to allow the upsert behavior beyond the initial migration and adoption of existing resources.

Given all this, here's a whole new solution approach I was thinking about:
What if Pulumi wrote the pulumi resource name as pulumi.com/something annotation into the k8s resource. Upon deletion it checks if the pulumi resource name matches the value of this annotation, and only if that's the case it can go ahead and actually delete the resource. This would prevent the deletion in the case of this issue, because the current resource name is app-dep1, but the pulumi program is actually trying to delete app-dep.