kubernetes / cloud-provider

SIG CP backlog/tracking issue for kubernetes/kubernetes#70159

Following Tim's comment we also need to have a discussion on whether we want to document this as a best practice or enforce this across existing providers.

cc @aoxn

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@andrewsykim this issue and the originating one in k/k seem to concern the very point I raised the other day on Slack about attaching DigitalOcean load-balancer UUIDs as annotations. I'm currently PoC'ing if this is possible to do with the latest cloud provider framework (i.e., the Kubernetes 1.15 dependencies), especially with regards to Service objects being modified not only by the service controller. (kubernetes/kubernetes#70159 (comment) touches on the same topic, albeit at a time when PATCH semantics were not yet in use.)

especially with regards to Service objects being modified not only by the service controller.

(edit: Sorry for hitting enter too soon :/)

I was thinking about this as well. With PATCH now being used for updating Service status / metadata, we shouldn't be hitting conflict issue if underlying cloud provider implementation updates the annotations in between. Though it seems to oppose what the interface originally defines:

cloud-provider/cloud.go

Lines 117 to 121 in 4025669

    
           // EnsureLoadBalancer creates a new load balancer 'name', or updates the existing one. Returns the status of the balancer 
        
           // Implementations must treat the *v1.Service and *v1.Node 
        
           // parameters as read-only and not modify them. 
        
           // Parameter 'clusterName' is the name of the cluster as presented to kube-controller-manager 
        
           EnsureLoadBalancer(ctx context.Context, clusterName string, service *v1.Service, nodes []*v1.Node) (*v1.LoadBalancerStatus, error)

Maybe relaxing what the interface defines is the easiest way, however it feels somehow fragile without a generic mechanism to enforce the behavior.

/remove-lifecycle stale

My PoC to validate that patching the Service object does not interfere with the service controller's own operations and patching was successful -- I couldn't spot any error logs or changing behaviors as the LBs came up. I think the patching of mine triggers another service update, but that's not too much of a concern.

One issue I'm still somewhat struggling with is that I'd like the patching to occur as my cloud provider methods are executed (i.e., during EnsureLoadBalancer) which means I need to inject a Kubernetes clientset. The Initialize interface method already passes on a cloudprovider.ControllerClientBuilder to derive the client from. However, the client should ideally (at least that's what I think) go into the cloud implementation that's being returned as a callback to cloudprovider.RegisterCloudProvider. I currently manage to do this by setting a field on my cloud struct during Initialize, but that feels kinda dirty and relies on the current cloud provider initialization order (that is, Initialize is called prior to the service controllers starting).

I could also set up a controller of my own, but that seems like a lot of extra work I don't really need. Maybe we can extend the Register callback signature to pass along the necessary configuration settings?

Curious if anyone has any ideas and what folks think. (@MrHohn, did you possibly explore a similar direction?)

@aoxn do we want to work on this for v1.16?

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

I actually think it's okay to close this now. Service w/ Type=LB now have finalizers so cloud providers should be able to find and delete any associated resource without worrying about the service resource no longer existing.

@andrewsykim my understanding was that the tracking issue kubernetes/kubernetes#70159 had a different focus, which is to track a non-ambiguous ID for the cloud LB. Right now the only identifier that the Cloud Provider interface provides is the name; however, the name can be subject to change (even on purpose to support LB renaming), so a different means seems necessary.

The finalizer seems to ensure that resources get deleted eventually, but does not guarantee that I am able to reference the cloud LB easily / efficiently / non-ambiguously at any point during its lifetime. That's why we started to track the LB ID as an annotation on the Service at DigitalOcean but that comes with a few restrictions. (See my comment #27 (comment) above.)

that makes sense

/reopen

@andrewsykim: Reopened this issue.

In response to this:

that makes sense

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/remove-lifecycle rotten
/lifecycle frozen

@timoreimann let me know if you can work on this

@andrewsykim I can put together a PR as a basis for further discussion. We've been "violating" the do-not-modify-services-directly advice mentioned in the GoDoc for quite some time in DigitalOcean's CCM in order to associate the LB ID with the Service. Having a "blessed" approach for doing so would be beneficial from my point of view.

I'm not sure if I can make it for 1.19 though.

	// EnsureLoadBalancer creates a new load balancer 'name', or updates the existing one. Returns the status of the balancer
	// Implementations must treat the v1.Service and v1.Node
	// parameters as read-only and not modify them.
	// Parameter 'clusterName' is the name of the cluster as presented to kube-controller-manager
	EnsureLoadBalancer(ctx context.Context, clusterName string, service v1.Service, nodes []v1.Node) (*v1.LoadBalancerStatus, error)

Track underlying cloud resources for Services