Named resources get deleted from cluster after pulumi resource name change, even though indicated otherwise by pulumi
sfc-gh-jlangefeld opened this issue · comments
What happened?
When changing the pulumi resource name of a kubernetes resource that was named via
Metadata: metav1.ObjectMetaArgs{
Name: pulumi.String("test"),
},
and then running pulumi up
, pulumi indicates that the resource gets deleted and created (as expected), but in reality the resource only gets deleted.
This is a risk for us in the case of resource name changes. Especially that the preview and up indicate that the resource should be created again, but it never was in reality.
We rely on resource naming via metadata.name
because that's the only way to adopt existing resource, a common use case for us.
Example
Here are some good steps to reproduce and to exemplify the behavior:
- Create kind cluster via
kind create cluster
- Create the quickstart project as described in https://www.pulumi.com/docs/clouds/kubernetes/get-started/create-project/
- Add a Kubernetes
metadata.name
in theappsv1.DeploymentArgs
struct via:Metadata: metav1.ObjectMetaArgs{ Name: pulumi.String("test"), },
- Run
pulumi up
and make sure the deployment was created:╰─ kubectl get deploy test NAME READY UP-TO-DATE AVAILABLE AGE test 1/1 1 1 10s
- Change the name of the pulumi resource to something new:
-deployment, err := appsv1.NewDeployment(ctx, "app-dep", &appsv1.DeploymentArgs{ +deployment, err := appsv1.NewDeployment(ctx, "app-dep1", &appsv1.DeploymentArgs{
- Run
pulumi preview
and it shows correctly that it wants to deletetest
and createtest1
:╰─ pulumi preview Previewing update (dev) View in Browser (Ctrl+O): https://app.pulumi.com/jonny-langefeld/quickstart3/dev/previews/eb071e68-ace0-4ce3-a36e-fecf4f91f419 Type Name Plan pulumi:pulumi:Stack quickstart3-dev + ├─ kubernetes:apps/v1:Deployment app-dep1 create - └─ kubernetes:apps/v1:Deployment app-dep delete Resources: + 1 to create - 1 to delete 2 changes. 1 unchanged
- Run
pulumi up
and it will show you that it actually deletedtest
and createdtest1
:Updating (dev) View in Browser (Ctrl+O): https://app.pulumi.com/jonny-langefeld/quickstart3/dev/updates/11 Type Name Status pulumi:pulumi:Stack quickstart3-dev + ├─ kubernetes:apps/v1:Deployment app-dep1 created (0.25s) - └─ kubernetes:apps/v1:Deployment app-dep deleted (1s) Outputs: name: "test" Resources: + 1 created - 1 deleted 2 changes. 1 unchanged Duration: 5s
- check the resource on kubernetes and it shows that it's not there anymore (note that the kubernetes resource name should still be
test
because that wasn't changed):╰─ kubectl get deploy test Error from server (NotFound): deployments.apps "test" not found
Output of pulumi about
╰─ pulumi about
CLI
Version 3.107.0
Go Version go1.22.0
Go Compiler gc
Plugins
NAME VERSION
go unknown
kubernetes 4.10.0
kubernetes 3.30.2
Host
OS darwin
Version 14.4.1
Arch arm64
This project is written in go: executable='/opt/homebrew/bin/go' version='go version go1.22.2 darwin/arm64'
Current Stack: jonny-langefeld/quickstart3/dev
TYPE URN
pulumi:pulumi:Stack urn:pulumi:dev::quickstart3::pulumi:pulumi:Stack::quickstart3-dev
pulumi:providers:kubernetes urn:pulumi:dev::quickstart3::pulumi:providers:kubernetes::default
kubernetes:core/v1:ConfigMap urn:pulumi:dev::quickstart3::kubernetes:core/v1:ConfigMap::test2
Found no pending operations associated with dev
Backend
Name pulumi.com
URL https://app.pulumi.com/jonny-langefeld
User jonny-langefeld
Organizations jonny-langefeld, snowflake
Token type personal
Dependencies:
NAME VERSION
github.com/pulumi/pulumi-kubernetes/sdk/v4 v4.10.0
github.com/pulumi/pulumi/sdk/v3 v3.112.0
Pulumi locates its logs in /var/folders/nj/gdkbhsx936989xh4_xt0gm400000gn/T/ by default
Contributing
Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).
Also able to replicate the issue based on the repro steps. It appears to be planning/adding based on the id, but then deleting based on Spec.Metadata.Name, causing and add / delete instead of a replace. The add upserts, giving the deployment a new Id, same name, and the delete then removes the freshly upserted deployment.
I also note, if I create two deployments with the same Metadata.Name, but different ID, Pulumi will register that it has created 2 deployment resources, but only 1 deployment exists in the cluster. It will create 1 and then immediately upsert on it (replacing the first ID), while tracking 2 in state.
Thank you for bringing this issue to our attention, @sfc-gh-jlangefeld. Currently, the behavior you're experiencing is a deliberate aspect of how server-side-apply functions within the Pulumi Kubernetes provider which unfortunately results in this scenario. When utilizing server-side-apply (SSA) and creating a new Pulumi resource, the provider performs an upsert operation on the corresponding Kubernetes object. Consequently, during a pulumi up
operation, if the object already exists in the cluster, instead of encountering a "resource already exists" error, the provider updates the existing object to align with the desired specification outlined in the Pulumi program. This particular behavior can lead to situations like the one you've encountered - which is not ideal.
We're actively tracking this behavior in another open issue (#2926), which addresses the desire to alter this upsert behavior to have the provider consistently fail during creation if the object already exists in the cluster. It's worth noting that even if this behavior is modified for SSA, renaming Pulumi resources could still present challenges. In such cases, rather than the object being lost in the cluster, an error would be triggered indicating that the object already exists.
For renaming Pulumi resources, we recommend utilizing a resource alias. Further guidance on this can be found in our documentation: Link to Documentation
I will be closing this issue in favor of #2926 which tracks the same underlying behaviour with SSA.
Hi @rquitales unfortunately #2926 doesn't present a solution for this issue. We actually specifically rely on the SSA upsert behavior by setting metadata.name
as mentioned in the issue description. We are trying to use pulumi to adopt 1000s of existing Kubernetes resources that were previously created via other means. We cannot recreate all these resources as that would mean a short downtime for many production resources. So we would not want to fail in this case as desired in #2926.
Given that we specifically rely on the SSA upsert behavior I think we should re-open this issue.
Acceptable solutions could include the following:
- pulumi fails upon a pulumi name re-name
- reorder the create and delete of the renamed resource, so that the delete always happens and completes before the create
- offer a protection machanism that prevents pulumi resource name re-naming (pulumi protect wouldn't work here as there are legitimate reasons wanting to delete a resource. So it would need to be specific to the re-naming case)
I think the main issue right now is that both, pulumi preview
and pulumi up
evidently show the wrong thing by saying it deleted and created the resource, but in reality it only deleted it, which could be fatal.
Do you think that's possible?
@sfc-gh-jlangefeld I agree with your statement that the current different results of pulumi preview
and pulumi up
is an issue that needs to be resolved and fixed so that users do not end up with a deleted Kubernetes object, while Pulumi still believes that the object exists.
pulumi fails upon a pulumi name re-name
This is a change that would need to be done in the Pulumi engine and I'm not sure if it's technically possible since the engine would see a create event and a delete event. I don't think it's capable of knowing that these two events are due to a renaming event.
reorder the create and delete of the renamed resource, so that the delete always happens and completes before the create
There is ongoing discussions about delete before create in upstream pulumi/pulumi about this, ref: pulumi/pulumi#2877
offer a protection machanism that prevents pulumi resource name re-naming (pulumi protect wouldn't work here as there are legitimate reasons wanting to delete a resource. So it would need to be specific to the re-naming case)
This should occur once we land #2926, as renaming the pulumi resource itself will result in a resource already exists
error. This then allows the user to use resource aliasing to rename their pulumi resources should they really need to rename them.
We actually specifically rely on the SSA upsert behavior by setting...
To address your concerns about this, we're investigating also allowing the current upsert behaviour once #2926 is landed via an opt-in resource option. This would allow users who are currently relying on this behaviour to adopt resources to not be blocked.
I think in your case @sfc-gh-jlangefeld you would want three things:
- For resources you are intentionally adopting into Pulumi, use the current default or potential future opt-in to allow upsert of an existing resource (defined either outside of Pulumi, in another Pulumi program, or in the same Pulumi program).
- If you are ever doing a purely logical rename of a Pulumi resource, make sure to apply an
alias
so that Pulumi understands this is the same resource and should be updated instead of seperately created and deleted. - Optionally - apply the
delete-before-replace
resource option to these resources so that in case of need for replacement, they are first deleted and then re-created, avoiding accidentally deleting the "new" (not actually new do to upsert) resource.
The most important part of this is (2). This looks like:
}, pulumi.Aliases([]pulumi.Alias{{Name: pulumi.String("app-dep")}}))
This ensures that Pulumi knows that app-dep
and app-dep1
are "the same thing", and should be an update instead of an unrelated create and delete of two different unrelated resources.
Then these are a create and delete of unrelated resources, Pulumi will do the create operation first, and the delete at the end of the update during it's cleanup phase. That means that the "new" resources will be created first, but there is no new resource because it is upserted, so both the new and old logical Pulumi resources are actually referring to the same physical resources. When Pulumi then deletes the "old" logical resource, it deletes the one and only physical resource.
The way to avoid this ever happening - while still supporting upsert - is to do (2) and (3) above. I believe with those applied, you should get all of the expected results here - while still being able to get upsert behaviour on your resources.
Thank you for your replies and thoughts here!
I understand that (2) and (3) are a way to intentionally tell Pulumi that the resource name changed. However we were more concerned about the unintentional case and even more so about an implicit case. Maybe the resource name is the result of some other functions and a future engineer changes those (as opposed to directly changing some resource name).
The CI running a preview would maybe show them that some resources would get deleted, but also created. Even the pulumi up
would say the same thing. But then the reality of course is that the resource is actually deleted, which can easily cause an outage.
We would like to either prevent this in code, or visualize it via the preview output, that after the run the resource is actually deleted.
@rquitales summarized it nicely with this:
the current different results of
pulumi preview
andpulumi up
is an issue that needs to be resolved and fixed so that users do not end up with a deleted Kubernetes object, while Pulumi still believes that the object exists.
Regarding the 'failure on upsert' as described in #2926, that would actually be undesired behavior for us and we would be a candidate for the opt-in to still allow upserts. All we care about is that the state of the real world is as it should be. If that means that a resource already exists we don't want to fail, but we want to accept it as the new truth and the program should be a success because the desired state matched the real world state. This is also our reason for why we'd like to allow the upsert behavior beyond the initial migration and adoption of existing resources.
Given all this, here's a whole new solution approach I was thinking about:
What if Pulumi wrote the pulumi resource name as pulumi.com/something
annotation into the k8s resource. Upon deletion it checks if the pulumi resource name matches the value of this annotation, and only if that's the case it can go ahead and actually delete the resource. This would prevent the deletion in the case of this issue, because the current resource name is app-dep1
, but the pulumi program is actually trying to delete app-dep
.