IBM / cp4waiops-gitops

Manage Your IBM Cloud Pak for Watson AIOps With GitOps

Home Page:https://ibm.github.io/cp4waiops-gitops/docs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Gitops : Error creating : pods xxxx is forbidden

Gilles-Plaquet opened this issue · comments

While I was trying to deploy the Event and the AI-Manager I stumbled across an issue regarding permissions that results into a failed to create x.
I added a screenshot in the attachments regarding the error. I get the same error on multiple resources that are trying to create objects.

I already checked that my argo-cd has te required cluster-role bindings. Just to make sure, I added a screenshot of the yaml file of this role binding aswell.

Hoping someone can help me resolve this issue !
Thanks in advance.

Kind regards,
Gilles

Screenshot 2022-11-29 at 15 33 44

Screenshot 2022-11-29 at 17 22 00

Screenshot 2022-11-29 at 17 03 25

@Gilles-Plaquet can you provide more info, at which step you failed based on the document at here https://ibm.github.io/cp4waiops-gitops/docs/how-to-deploy-cp4waiops-35 ? What is your OCP version?

@gyliu513 the current Openshift version is 4.8.39. In the documentation it stated that i should be above 4.5 so i guess that should be fine.

I was able to create the ceph-cluster, and the shared application without any issues ( all app details seem to be healthy there)
I guess its the moment it started installing the AI-Manager,I noticed part of the application getting degraded. Then i started to get the issue stated above.

Hope this information helped.

@Gilles-Plaquet can you login to your ocp cluster and run the command oc get po -n cp4waiops to check the pod status? If there are some pods not running, can you run oc logs for one of the not running pod and append the log here?

@gyliu513 there are none.
in the error message he is telling that he can't create pods. thats why we don't see anything i think.

Screenshot 2022-11-29 at 18 00 56

I went to check in the openshift interface and then i see this :
Screenshot 2022-11-29 at 18 03 45

thanks @Gilles-Plaquet , seems permission issue, but it is weird as you already have the cluster admin permission for argo CD, let me dig more.

In the meantime, can you run oc get pods -n rook-ceph to make sure all rook ceph pods are running well?

@gyliu513 exactly, that was my reasoning as well... permission issue but i have all the cluster permissions.
Thanks already for the help !

I also ran the command and everything in the rook-ceph cluster looks fine(too me) .
Screenshot 2022-11-29 at 18 34 28

@Gilles-Plaquet let me check more with @huang-cn and @morningspace , they are located in China, and hope we can give you more info tomorrow, thanks!

thanks a lot already !

@Gilles-Plaquet
I see you mentioned that you are deploying both Event and the AI-Manager. May I know which install option you are taking, e.g.: to install it one by one, or use the all-in-one template. Also, may I know which release you deploy? Can you share the outputs of oc get csv under namespace cp4waiops and ibm-common-services?

@morningspace

I used the one by one installation since, the other one was in technical preview. I opted for release 3.5.

  • oc get csv -n cp4waiops output:

Screenshot 2022-11-30 at 08 58 08

  • oc get csv -n ibm-common-services

Screenshot 2022-11-30 at 08 59 38

@huang-cn did a test using 3.5 release today and it can work w/o problem, so I guess there must be something different on your cluster. Will check w/ @huang-cn and keep you posted tomorrow.

@morningspace Thanks a lot!
In case a webex,zoomcall,.... is easier to help solve the issue, that is possible ofcourse !

@Gilles-Plaquet I don't understand why there's this runAsUser: Invalid value: 1001 error appears here, the AIOPS should not use runAsUser scc option at all, it shouldn't specify any UID value and let OCP to allocate one. I'm wondering if the cagalog image in this env is the same as in ours?
Could you run commands below to check the catalogsource image and operator scc settings?

oc -n openshift-marketplace get catalogsource ibm-operator-catalog -oyaml|grep image:

oc -n cp4waiops get deploy iaf-core-operator-controller-manager -oyaml|grep -v 'f:securityContext'|grep securityContext  -A8

oc -n ibm-common-services get deploy ibm-common-service-operator -oyaml|grep -v 'f:securityContext'|grep securityContext  -A8
  • oc -n openshift-marketplace get catalogsource ibm-operator-catalog -oyaml|grep image:

Screenshot 2022-12-01 at 14 26 16

  • oc -n cp4waiops get deploy iaf-core-operator-controller-manager -oyaml|grep -v 'f:securityContext'|grep securityContext -A8

Screenshot 2022-12-01 at 14 29 07

  • oc -n ibm-common-services get deploy ibm-common-service-operator -o yaml|grep -v 'f:securityContext'|grep securityContext -A8

Screenshot 2022-12-01 at 14 32 00

Currently the namespace is not existing however yesterday it was, see the post above. this might be since openshift was unable to install the operator.

@Gilles-Plaquet the AIOps never uninstall ibm-common-services components unless you remove them manually, it is weird, we can talk next Monday to dig more, hope it is OK. Thanks!