m88i / nexus-operator

Sonatype Nexus OSS Kubernetes Operator based on Operator SDK

Home Page:http://operatorhub.io/operator/nexus-operator-m88i

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[blocker] Nexus Deployment stuck on OpenShift 4.x

ricardozanini opened this issue · comments

Describe the bug
When trying to create a simple Nexus3 instance with the example "CentOS No Persistence", I got:

2020-11-01T14:29:25.381Z	INFO	controllers.Nexus	Reconciling Nexus
2020-11-01T14:29:25.389Z	DEBUG	Fetching the latest micro from minor 28
2020-11-01T14:29:25.389Z	DEBUG	Replacing 'spec.image' (docker.io/sonatype/nexus3:3.28.1) with 'docker.io/sonatype/nexus3:3.28.1'
2020-11-01T14:29:25.396Z	INFO	Generating required resources
2020-11-01T14:29:25.396Z	DEBUG	Generating required Deployment
2020-11-01T14:29:25.396Z	DEBUG	Generating required Service
2020-11-01T14:29:25.396Z	DEBUG	Generating required Service Account
2020-11-01T14:29:25.396Z	DEBUG	Generating required Secret
2020-11-01T14:29:25.396Z	INFO	Fetching deployed resources
2020-11-01T14:29:25.396Z	INFO	Attempting to fetch	{"deployed": "Deployment"}
2020-11-01T14:29:25.396Z	DEBUG	There is no deployed Deployment
2020-11-01T14:29:25.396Z	INFO	Attempting to fetch	{"deployed": "Service"}
2020-11-01T14:29:25.396Z	DEBUG	There is no deployed Service
2020-11-01T14:29:25.396Z	INFO	Attempting to fetch	{"deployed": "Persistent Volume Claim"}
2020-11-01T14:29:25.396Z	DEBUG	There is no deployed Persistent Volume Claim
2020-11-01T14:29:25.396Z	INFO	Attempting to fetch	{"deployed": "Secret"}
2020-11-01T14:29:25.396Z	DEBUG	There is no deployed Secret
2020-11-01T14:29:25.396Z	INFO	Attempting to fetch	{"deployed": "Service Account"}
2020-11-01T14:29:25.396Z	DEBUG	There is no deployed Service Account
2020-11-01T14:29:25.396Z	INFO	Attempting to fetch	{"deployed": "Route"}
2020-11-01T14:29:25.396Z	DEBUG	There is no deployed Route
2020-11-01T14:29:25.396Z	INFO	Attempting to fetch	{"deployed": "Ingress"}
2020-11-01T14:29:25.396Z	DEBUG	There is no deployed Ingress
2020-11-01T14:29:25.396Z	INFO	controllers.Nexus	Will 	{"create ": 1, ", update ": 0, ", delete ": 0, " instances of ": "v1.Deployment"}
2020-11-01T14:29:25.410Z	INFO	controllers.Nexus	Updating application status before leaving
2020-11-01T14:29:25.410Z	INFO	controllers.Nexus	Checking Deployment Status
2020-11-01T14:29:25.410Z	INFO	controllers.Nexus	Controller finished reconciliation
2020-11-01T14:29:25.410Z	ERROR	controller	Reconciler error	{"reconcilerGroup": "apps.m88i.io", "reconcilerKind": "Nexus", "controller": "nexus", "name": "nexus3", "namespace": "nexus", "error": "deployments.apps \"nexus3\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.3/pkg/internal/controller/controller.go:246
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.3/pkg/internal/controller/controller.go:218
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.3/pkg/internal/controller/controller.go:197
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/go/pkg/mod/k8s.io/apimachinery@v0.19.0/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/go/pkg/mod/k8s.io/apimachinery@v0.19.0/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/apimachinery@v0.19.0/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
	/go/pkg/mod/k8s.io/apimachinery@v0.19.0/pkg/util/wait/wait.go:90

It's stucking in the deployment creation.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy the operator with the nexus-operator.yaml file
  2. Deploy the example "CentOS No Persistence"

Expected behavior
Deployment and other resources to be created

Environment
OpenShift 4.5

This same error happens to whatever resource we try to create:

2020-11-01T14:35:00.133Z	ERROR	controller	Reconciler error	{"reconcilerGroup": "apps.m88i.io", "reconcilerKind": "Nexus", "controller": "nexus", "name": "nexus3", "namespace": "nexus", "error": "secrets \"nexus3\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}
2020-11-01T14:35:00.133Z	ERROR	controller	Reconciler error	{"reconcilerGroup": "apps.m88i.io", "reconcilerKind": "Nexus", "controller": "nexus", "name": "nexus3", "namespace": "nexus", "error": "secrets \"nexus3\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}

Not only deployments.

Looks like we need to add permissions for finalizers.

It's strange, because we already have permissions to use the update verb indeployment/finalizers, see this. Perhaps that's not enough on newer OCP versions? Not sure why though.

We should be able to slowly add the permissions (the API object is in the format of <kind>/finalizers and belongs to <kind>'s group from what I looked around) and fix the issue by only adding what's necessary.

We could start by adding all verbs to deployment/finalizers and secret/finalizers (and whatever other Kinds we see on the logs, mentioning these ones because they're the only ones mentioned in the issue), to first make sure they will work correctly and we're really just facing a permission issue. Once done and confirmed, we can try adding one verb at a time.

Unfortunately I can't test this myself as I can't run CRC, but let me know if I can be of any help some other way.

Not sure if it's a problem in the permissions or the way we are defining the owner and the options for the finalizers for each resource. I'll take a look later. 🤤