vmware-archive / kubecfg

A tool for managing complex enterprise Kubernetes environments as code.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"kubecfg update" removes server-set default values (v1/Service "ports[].nodePort" field)

seh opened this issue · comments

Summary

When using kubecfg update to apply a set of manifests to the API server, for Service objects of type "NodePort" that already exist, we lose the ports[].nodePort values set by the server, causing the server to assign fresh node ports. This mutation occurs whether or not we've changed anything in the source manifest for the Service object.

This issue follows preceding discussion in the "ksonnet" channel of the "Kubernetes" Slack team.

Reproducible Example

Consider this Service manifest:

apiVersion: v1
kind: Service
metadata:
  name: example
spec:
  selector:
    app: example
  ports:
  - protocol: TCP
    port: 80
  type: NodePort

Note that the ports[].nodePort fields are absent. If we apply it three times in succession—without editing it in between—we see that kubecfg update induces the API server to supply a fresh node port each time.

% kubecfg update --namespace=default service.yaml
INFO  Validating services example
INFO  validate object "/v1, Kind=Service"
INFO  Fetching schemas for 1 resources
INFO  Updating services example
INFO  Creating non-existent services example

% kubectl --namespace=default get service example
NAME      TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
example   NodePort   10.104.95.202   <none>        80:32585/TCP   25s

% kubecfg update --namespace=default service.yaml
INFO  Validating services example
INFO  validate object "/v1, Kind=Service"
INFO  Fetching schemas for 1 resources
INFO  Updating services example

% kubectl --namespace=default get service example
NAME      TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
example   NodePort   10.104.95.202   <none>        80:30499/TCP   39s

% kubecfg update --namespace=default service.yaml
INFO  Validating services example
INFO  validate object "/v1, Kind=Service"
INFO  Fetching schemas for 1 resources
INFO  Updating services example

% kubectl --namespace=default get service example
NAME      TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
example   NodePort   10.104.95.202   <none>        80:31269/TCP   11m

After creating the Service, we get node port 32585. Then applying the manifest again yields node port 30499 and then 31269, when no changes should have occurred the second or third time.

I suspect the problem is due to kubecfg update sending a patch that overwrites the core/v1.ServicePort object, leaving the "NodePort" field with its zero value, which the API server then takes as a request to assign a fresh port.

Versions

  • kubecfg
    kubecfg version: v0.9.0
    jsonnet version: v0.10.0
    client-go version: v0.0.0-master+$Format:%h$
  • kubectl
    version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-08T16:31:10Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
  • Kubernetes server components
    version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:08:19Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Aha. This is because ports[] is an array, and the semantics for a json-merge-patch (as used by kubecfg) are that arrays are replaced. This was the motivation for the k8s "strategic merge patch" type that has lots of k8s-specific logic to try to work out how to "merge" json arrays that are conceptually actually named maps or sets in the k8s types.

So: this fix is not trivial and requires replacing the core update primitive. The (easy-ish) solution is to replace it with strategic-merge-patch (requires other changes too) and do what kubectl apply does. This has other issues unfortunately, so it isn't as obviously appealing as it might seem (eg: it halves the max size of k8s objects, doesn't support more than one merge-updater, poor support for CRDs, etc). I knew improvements here were necessary, but so far had (surprisingly?) not found a hard failure example that was important enough to raise the priority of this work. I think this bug is that example, and a good test case for future improvements.

The workaround in the meantime is simple (but annoying): set an explict nodePort value in these services, in the yaml/jsonnet source.

Thank you for the explanation. I understand that fixing this isn't easy.

The workaround in the meantime is simple (but annoying): set an explict nodePort value in these services, in the yaml/jsonnet source.

Yes, I think it will come to that. In this case, since I'll probably be taking this node port and asking some other administrators to make use of it, actually choosing it and committing the value to the VCS is justifiable.

In general, though, we should do no worse than what kubectl apply has managed to do. I do lament the prospect of having to store the previously applied value in order to arrange for a three-way merge. I think that would solve this case, assuming that there's only one updating party, as you mentioned.

If we used strategic merge patch instead, given that the core/v1.Service.Ports field is annotated to use the "Port" field as its key for the "merge" patch strategy, what happens with the "NodePort" field? If the client-submitted JSON object arrives without a "nodePort" field, does strategic merge patch preserve the field that's present in the server's version of the target object?

If we used strategic merge patch instead, given that the core/v1.Service.Ports field is annotated to use the "Port" field as its key for the "merge" patch strategy, what happens with the "NodePort" field? If the client-submitted JSON object arrives without a "nodePort" field, does strategic merge patch preserve the field that's present in the server's version of the target object?

Yes s-m-p would preserve the nodePort present in the server's version.

If kubecfg used s-m-p, it would look like this:

  1. Before creating the Service, kubecfg stores a duplicated copy of the original object stored as json in a particular annotation.
  2. Send to server with a normal "create" operation.
  3. Some k8s-internal controller allocates a specific nodePort value and updates the ports[0].nodePort field.
  4. A later kubecfg update to this Service, sends the object via a type=strategic-merge-patch "patch" operation (with an updated json-encoded copy in the annotation, to use as the base for future updates). At this point the server does:
    1. Pull out the "original" object from the annotation on the current/existing value.
    2. Generate a "diff" between the patch request body and the original object.
    3. Apply the diff to the current/existing value of the object.

The server uses hard-coded internal knowledge about v1.Service.Ports (the fact that the ports[] array is conceptually a map keyed by port) to make the "diff" generation/application ignore the actual ordering of elements in the ports[] array.


This "hard-coded internal knowledge" is what makes this all a bit unpleasant, since the client can't actually know what's going to happen. For example, if you specify a new array of podspec.containers, do you want to replace the existing set of containers, or merge in with the existing set? Good luck working out what the server will do.

One solution to this is to run the s-m-p logic client-side, which is technically not too difficult, because you can just link in the relevant golang library. ksonnet/ksonnet does this in newer versions, for comparison. This is good for predictability, but means the client now has to contain all the hard-coded internal knowledge, which makes the code much larger and ties us strongly to particular k8s server/schema versions. In any case (server or client side), "2nd tier" objects like CRDs or (with client-side) non-core objects like openshift lose out because they don't have an opportunity to embed their specific logic everywhere.

This last bit is getting slowly better as the mergeKey metadata is added to the exported openapi schema, and CRDs are allowed to publish schemas. It's literally years behind "tier 1" object types, however.

I've been thinking about this for years (since first starting work on kubecfg), and I'm pretty sure I can intuit most cases from the openapi schema, without needing a 3-way merge. I just need to code it up (and teach everyone about a different set of odd corner cases). Unfortunately there's never enough time :(

Another case came to light: If a Deployment's "spec.template.spec.containers" entry lacks a "resources" field, but some other agent then adds that field to the Deployment (such as addon-resizer), the next time I run kubecfg update, even if I haven't changed anything about the Deployment's manifest, kubecfg still overwrites that container entry (again with no "resources" field), which then causes a new ReplicaSet to come along, and then addon-resizer adjusts the Deployment again, and so on.

I'm not sure whether a strategic merge patch would preserve the "resources" field added by the addon-resizer.

Another case: Removing annotations from a PodSecurityPolicy. First I defined the annotations I wanted, and had kubecfg put them in place while creating the object. Later I decided to remove the annotations, deleted them from my manifest, and ran kubecfg update. It fails to remove the annotations.

Thank you! 🎉