Operator pod fails to start

Question

Operator pod fails to start

stephenl03 opened this issue 5 years ago · comments

I don't know go and new to kubernetes. I attempted to install and run your fork of kube-plex and ran into errors. The main pod starts, but the operator fails. Any assistance would be greatly appreciated.

Here are the logs from the operator pod:

kubectl logs pod/plex-kube-plex-operator-7f6d87b674-r4mrf
{"level":"info","ts":1566266565.4639938,"logger":"cmd","msg":"Go Version: go1.12.7"}
{"level":"info","ts":1566266565.464101,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1566266565.4641156,"logger":"cmd","msg":"Version of operator-sdk: v0.9.0"}
{"level":"info","ts":1566266565.4655316,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1566266565.5546064,"logger":"leader","msg":"Found existing lock with my name. I was likely restarted."}
{"level":"info","ts":1566266565.5546658,"logger":"leader","msg":"Continuing as the leader."}
{"level":"info","ts":1566266565.613103,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1566266565.6134617,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"plextranscodejob-controller","source":"kind source: /, Kind="}
{"level":"error","ts":1566266565.6600964,"logger":"kubebuilder.source","msg":"if kind is a CRD, it should be installed before calling Start","kind":"PlexTranscodeJob.plex.tv","error":"no matches for kind \"PlexTranscodeJob\" in version \"plex.tv/v1alpha1\"","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/root/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start\n\t/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.12/pkg/source/source.go:89\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Watch\n\t/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.12/pkg/internal/controller/controller.go:122\ngithub.com/mcadam/plex-operator/pkg/controller/plextranscodejob.add\n\tplex-operator/pkg/controller/plextranscodejob/plextranscodejob_controller.go:77\ngithub.com/mcadam/plex-operator/pkg/controller/plextranscodejob.Add\n\tplex-operator/pkg/controller/plextranscodejob/plextranscodejob_controller.go:39\ngithub.com/mcadam/plex-operator/pkg/controller.AddToManager\n\tplex-operator/pkg/controller/controller.go:13\nmain.main\n\tplex-operator/cmd/manager/main.go:121\nruntime.main\n\t/usr/lib/go-1.12/src/runtime/proc.go:200"}
{"level":"error","ts":1566266565.6605635,"logger":"cmd","msg":"","error":"no matches for kind \"PlexTranscodeJob\" in version \"plex.tv/v1alpha1\"","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/root/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\nmain.main\n\tplex-operator/cmd/manager/main.go:122\nruntime.main\n\t/usr/lib/go-1.12/src/runtime/proc.go:200"}```

Adam · Answer 1 · Tue Aug 20 2019 22:40:20 GMT+0800 (China Standard Time)

Oh right we need to create the CRD resource and I didn’t put it in the chart, I will update and push with it, sorry about that.

Adam · Answer 2 · Tue Aug 20 2019 22:55:46 GMT+0800 (China Standard Time)

Ok, I just pushed into kube-plex repo to have the CRD deploy automatically.
So to fix it for you either redeploy the chart and you should have it or manually deploy it using that file https://github.com/mcadam/plex-operator/blob/master/deploy/crds/plex_v1alpha1_plextranscodejob_crd.yaml and then restart the operator pod.

Stephen Lewis · Answer 3 · Mon Aug 26 2019 05:50:20 GMT+0800 (China Standard Time)

I will just continue to use this issue to work through some of the issues I'm running across. That file seemed to fix my initial issue, but it seems like if the chart is deployed with a name --set name=plex, the operator pod defaults to trying to use the 'plex' serviceaccount, but the chart doesn't deploy with a service account with that name.

{"level":"error","ts":1566696904.0265555,"logger":"controller_plextranscodejob","msg":"Failed to create new idle transcoder Pod","Pod.Namespace":"default","Pod.Name":"","error":"pods \"plex-transcoder-\" is forbidden: error looking up service account default/plex: serviceaccount \"plex\" not found","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/root/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/mcadam/plex-operator/pkg/controller/plextranscodejob.(*ReconcilePlexTranscodeJob).ReconcileOperator\n\tplex-operator/pkg/controller/plextranscodejob/plextranscodejob_controller.go:195\ngithub.com/mcadam/plex-operator/pkg/controller/plextranscodejob.(*ReconcilePlexTranscodeJob).Reconcile\n\tplex-operator/pkg/controller/plextranscodejob/plextranscodejob_controller.go:144\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.12/pkg/internal/controller/controller.go:215\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.12/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/root/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/root/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/root/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:88"}

I was able to resolve that issue by not deploying the chart with a name, but encountered a new issue.

{"level":"info","ts":1566750252.4216094,"logger":"cmd","msg":"failed to initialize service object for metrics: replicasets.apps \"punk-hydra-plex-operator-575475dd4b\" is forbidden: User \"system:serviceaccount:plex:punk-hydra-plex\" cannot get resource \"replicasets\" in API group \"apps\" in the namespace \"plex\""}

Looks like the role for the operator is missing 'replicasets' from the resources section.

I'm now encountering a new issue and not too sure about this one.

E0825 20:49:41.659502       1 reflector.go:134] pkg/mod/k8s.io/client-go@v0.0.0-20190228174230-b40b2a5939e4/tools/cache/reflector.go:95: Failed to list *unstructured.Unstructured: the server could not find the requested resource
E0825 20:49:41.661675       1 reflector.go:134] pkg/mod/k8s.io/client-go@v0.0.0-20190228174230-b40b2a5939e4/tools/cache/reflector.go:95: Failed to list *v1alpha1.PlexTranscodeJob: the server could not find the requested resource (get plextranscodejobs.plex.tv)

Stephen Lewis · Answer 4 · Mon Aug 26 2019 06:25:55 GMT+0800 (China Standard Time)

Looks like the crd.yaml file doesn't get deployed with the chart. Once I created the custom resource deployment, that last error went away. However, this error seems to be back...

{"level":"info","ts":1566771556.3483832,"logger":"controller_plextranscodejob","msg":"Idle workers queue status","Queue.Length":0,"Queue.Pods":[]}
{"level":"info","ts":1566771556.3485508,"logger":"controller_plextranscodejob","msg":"Creating a new idle transcoder Pod","Pod.Namespace":"plex","Pod.Name":""}
{"level":"error","ts":1566771556.3565018,"logger":"controller_plextranscodejob","msg":"Failed to create new idle transcoder Pod","Pod.Namespace":"plex","Pod.Name":"","error":"pods \"plex-transcoder-\" is forbidden: error looking up service account plex/plex: serviceaccount \"plex\" not found","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/root/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/mcadam/plex-operator/pkg/controller/plextranscodejob.(*ReconcilePlexTranscodeJob).ReconcileOperator\n\tplex-operator/pkg/controller/plextranscodejob/plextranscodejob_controller.go:195\ngithub.com/mcadam/plex-operator/pkg/controller/plextranscodejob.(*ReconcilePlexTranscodeJob).Reconcile\n\tplex-operator/pkg/controller/plextranscodejob/plextranscodejob_controller.go:144\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.12/pkg/internal/controller/controller.go:215\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.12/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/root/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/root/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/root/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:88"}
{"level":"error","ts":1566771556.3566318,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"plextranscodejob-controller","request":"plex/plex-transcoder-grpbj","error":"pods \"plex-transcoder-\" is forbidden: error looking up service account plex/plex: serviceaccount \"plex\" not found","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/root/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.12/pkg/internal/controller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.12/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/root/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/root/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/root/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:88"}```

Stephen Lewis · Answer 5 · Mon Aug 26 2019 12:55:54 GMT+0800 (China Standard Time)

I was able to get this working by creating a serviceaccount named plex and creating new roles and rolebinding to match. I'm not sure if there is something that needs to be tweaked to pull the serviceaccount name that gets created during the chart install.

Not sure if you want to close the issue now that I've resolved my issues or close once the code has been fixed. Thanks for your help though.

Adam · Answer 6 · Mon Aug 26 2019 23:16:10 GMT+0800 (China Standard Time)

I did run it using the name forced to plex on the chart. I should redeploy from scratch to make sure everything is working as its supposed to.
Still in alpha and some stuff hardcoded still in there thanks for raising it. I will let you know when I pushed a fix.

Adam · Answer 7 · Tue Aug 27 2019 00:20:32 GMT+0800 (China Standard Time)

I pushed a fix for the service account name to be picked up correctly and not hardcoded anymore.
The other error failed to initialize service object for metrics: replicasets.apps \"punk-hydra-plex-operator-575475dd4b\" is about the metrics of the operator, its ok for it to failed, it will still work. I haven't taken a look yet on those metrics and if we need / want them or not. From an old quick read by default it will give number of custom resources and that kind of things, might be nice to have, but would be a different issue for the future though.