vmware-archive / kubecfg

A tool for managing complex enterprise Kubernetes environments as code.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

kubecfg with custom resources fails validation

danderson opened this issue · comments

Setup: use kubecfg to define a CustomResourceDefinition, and then an object of that type.

kubecfg update will fail validation, because right now, the cluster doesn't know about the custom resource, so the 2nd object isn't valid. But if the CRD is applied before the object (which it is, in the current resource ordering), the update will result in fully valid state.

If the server says it doesn't know about a resource type, but the update has a CustomResourceDefinition for that type, kubecfg should relax validation slightly and allow the object to pass validation with limited checking.

Right. Your solutions are to either:

  • split the CRD create/use into two separate changes through your CI/CD pipeline,
  • use kubecfg update --validate=false (to go back to old behaviour), or
  • use the unreleased kubecfg update --ignore-unknown flag (see #208, #209).

Afaics the only alternative is implementing deep knowledge of CRDs/TPRs and using the yet-to-actually-be-created CRD resource to validate other resources in the same update. At this stage, I don't wish to take that path because a) I have historically tried hard to avoid semantic knowledge of objects because that approach is fragile and ties kubecfg to particular k8s releases and b) it still wouldn't work for aggregated API servers, where the knowledge of new types requires executing the new apiserver.

For comparision, kubectl always silently ignores unknown apiversion/kinds (from my reading of the code a while ago).

Opinions sought, particularly regarding what the default behaviour should be. Right now I think this is addressed by the upcoming validate --ignore-unknown opt-in flag, possibly with some improved docs or error messages.

Hrm. Brainstorming a bit more:

One option would be to default --ignore-unknown=true for update, but --ignore-unknown=false for validate. This kind of makes sense because update is going to go on to actually push to the server anyway, and the update will eventually hit a real server-side error if/when it hits unknown types. Otoh, this reasoning is much like just disabling pre-update validation (--validate=false), is "surprising" in that it no longer the same as validate && update --validate=false, and the entire goal of the feature was to try to catch "known" errors before making any server changes.

I could also base the --ignore-unknown default on whether the list of resources contains any CRD/TPR/APIService resources. This is more likely to reflect the situations where users might need to provide this flag in practice, but means we might have different defaults when run on a subset of resources, which is definitely "surprising".

In general, I want kubecfg to "just do the right thing" without requiring flags. I'm having trouble working out whether the "right thing" in this case is to be conservative or liberal :/ I suspect "just works" says I should change the default to be --ignore-unknown=true (for both update and validate)...

@mmikulicic: any opinions on ^

One option would be to default --ignore-unknown=true for update, but --ignore-unknown=false for validate [...]

I also like this idea (after all validate was just recently added to update),
in tune with keeping kubecfg as semantic agnostic as possible (which can
become an api-dependent nightmare to maintain).

if making kubecfg semantic agnostic means less work for kubecfg maintainers at the expense of the kubecfg, I'd rather have kubecfg by default try to understand CRDs and let the user optionally skip this smarts if this doesn't work on future k8s versions.

it still wouldn't work for aggregated API servers, where the knowledge of new types requires executing the new apiserver.

Doesn't this mean that generally speaking you cannot deploy a custom API server in the same "batch" as resources using it, irregardless of kubecfg? (i.e. the resource creation will be rejected by the k8s apiserver because there is either no apiregistration.k8s.io/v1beta1::APIService resource registered for that group or because the custom api server instance is not yet serving requests)

Wouldn't that imply that we just don't have to bother with this scenario as it's not supported for deeper reasons?

Doesn't this mean that generally speaking you cannot deploy a custom API server in the same "batch" as resources using it, irregardless of kubecfg?

Erm, yes. You need to register+start the custom apiserver before trying to manipulate any resources that are handled by that apiserver. Recognising that dependency without cheating (annotations to override sort order?) is "difficult", so yes I think we effectively require two separate runs of kubecfg (ie: externally imposed ordering) in this case.

I think this example is still relevant to this discussion, since it implies that second config "batch" still can't be schema-validated without actually installing the first batch. If we want to support kubecfg validate on the second batch, we still need some sort of --ignore-unknown - although I agree the kubecfg *update* case is not relevant.

For some reason, I still need to use --ignore-unknown in the command-line even if the default value is supposed to be true.