[blocker] Nexus Deployment stuck on OpenShift 4.x
ricardozanini opened this issue · comments
Describe the bug
When trying to create a simple Nexus3 instance with the example "CentOS No Persistence", I got:
2020-11-01T14:29:25.381Z INFO controllers.Nexus Reconciling Nexus
2020-11-01T14:29:25.389Z DEBUG Fetching the latest micro from minor 28
2020-11-01T14:29:25.389Z DEBUG Replacing 'spec.image' (docker.io/sonatype/nexus3:3.28.1) with 'docker.io/sonatype/nexus3:3.28.1'
2020-11-01T14:29:25.396Z INFO Generating required resources
2020-11-01T14:29:25.396Z DEBUG Generating required Deployment
2020-11-01T14:29:25.396Z DEBUG Generating required Service
2020-11-01T14:29:25.396Z DEBUG Generating required Service Account
2020-11-01T14:29:25.396Z DEBUG Generating required Secret
2020-11-01T14:29:25.396Z INFO Fetching deployed resources
2020-11-01T14:29:25.396Z INFO Attempting to fetch {"deployed": "Deployment"}
2020-11-01T14:29:25.396Z DEBUG There is no deployed Deployment
2020-11-01T14:29:25.396Z INFO Attempting to fetch {"deployed": "Service"}
2020-11-01T14:29:25.396Z DEBUG There is no deployed Service
2020-11-01T14:29:25.396Z INFO Attempting to fetch {"deployed": "Persistent Volume Claim"}
2020-11-01T14:29:25.396Z DEBUG There is no deployed Persistent Volume Claim
2020-11-01T14:29:25.396Z INFO Attempting to fetch {"deployed": "Secret"}
2020-11-01T14:29:25.396Z DEBUG There is no deployed Secret
2020-11-01T14:29:25.396Z INFO Attempting to fetch {"deployed": "Service Account"}
2020-11-01T14:29:25.396Z DEBUG There is no deployed Service Account
2020-11-01T14:29:25.396Z INFO Attempting to fetch {"deployed": "Route"}
2020-11-01T14:29:25.396Z DEBUG There is no deployed Route
2020-11-01T14:29:25.396Z INFO Attempting to fetch {"deployed": "Ingress"}
2020-11-01T14:29:25.396Z DEBUG There is no deployed Ingress
2020-11-01T14:29:25.396Z INFO controllers.Nexus Will {"create ": 1, ", update ": 0, ", delete ": 0, " instances of ": "v1.Deployment"}
2020-11-01T14:29:25.410Z INFO controllers.Nexus Updating application status before leaving
2020-11-01T14:29:25.410Z INFO controllers.Nexus Checking Deployment Status
2020-11-01T14:29:25.410Z INFO controllers.Nexus Controller finished reconciliation
2020-11-01T14:29:25.410Z ERROR controller Reconciler error {"reconcilerGroup": "apps.m88i.io", "reconcilerKind": "Nexus", "controller": "nexus", "name": "nexus3", "namespace": "nexus", "error": "deployments.apps \"nexus3\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}
github.com/go-logr/zapr.(*zapLogger).Error
/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.3/pkg/internal/controller/controller.go:246
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.3/pkg/internal/controller/controller.go:218
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.3/pkg/internal/controller/controller.go:197
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
/go/pkg/mod/k8s.io/apimachinery@v0.19.0/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
/go/pkg/mod/k8s.io/apimachinery@v0.19.0/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/go/pkg/mod/k8s.io/apimachinery@v0.19.0/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
/go/pkg/mod/k8s.io/apimachinery@v0.19.0/pkg/util/wait/wait.go:90
It's stucking in the deployment creation.
To Reproduce
Steps to reproduce the behavior:
- Deploy the operator with the
nexus-operator.yaml
file - Deploy the example "CentOS No Persistence"
Expected behavior
Deployment and other resources to be created
Environment
OpenShift 4.5
This same error happens to whatever resource we try to create:
2020-11-01T14:35:00.133Z ERROR controller Reconciler error {"reconcilerGroup": "apps.m88i.io", "reconcilerKind": "Nexus", "controller": "nexus", "name": "nexus3", "namespace": "nexus", "error": "secrets \"nexus3\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}
2020-11-01T14:35:00.133Z ERROR controller Reconciler error {"reconcilerGroup": "apps.m88i.io", "reconcilerKind": "Nexus", "controller": "nexus", "name": "nexus3", "namespace": "nexus", "error": "secrets \"nexus3\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}
Not only deployments.
Looks like we need to add permissions for finalizers.
It's strange, because we already have permissions to use the update
verb indeployment/finalizers
, see this. Perhaps that's not enough on newer OCP versions? Not sure why though.
We should be able to slowly add the permissions (the API object is in the format of <kind>/finalizers
and belongs to <kind>
's group from what I looked around) and fix the issue by only adding what's necessary.
We could start by adding all verbs to deployment/finalizers and secret/finalizers (and whatever other Kinds we see on the logs, mentioning these ones because they're the only ones mentioned in the issue), to first make sure they will work correctly and we're really just facing a permission issue. Once done and confirmed, we can try adding one verb at a time.
Unfortunately I can't test this myself as I can't run CRC, but let me know if I can be of any help some other way.
Not sure if it's a problem in the permissions or the way we are defining the owner and the options for the finalizers for each resource. I'll take a look later. 🤤
SDK 1.0.1 left this behind in their docs:
https://sdk.operatorframework.io/docs/building-operators/golang/tutorial/#specify-permissions-and-generate-rbac-manifests
Here are more details about this problem:
operator-framework/operator-sdk#3477
Admission controller on OpenShift is enforcing this:
https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#ownerreferencespermissionenforcement
Won't happen on vanilla Kubernetes.