clusteradm init failed

Question

clusteradm init failed

zhujian7 opened this issue a year ago · comments

use the latest clusteradm(curl -L https://raw.githubusercontent.com/open-cluster-management-io/clusteradm/main/install.sh | bash) to init a hub cluster failed:

╰─# /usr/local/bin/clusteradm init --bundle-version='latest' --output-join-command-file join.sh --wait
Preflight check: HubApiServer check Passed with 0 warnings and 0 errors
Preflight check: cluster-info check Passed with 0 warnings and 0 errors
CRD successfully registered.
Registration operator is now available.
⠏ Waiting for cluster manager registration to become ready...

ClusterManager registration is now available.
Error: unexpected watch event received

apiVersion: operator.open-cluster-management.io/v1
kind: ClusterManager
metadata:
  creationTimestamp: "2023-07-20T09:45:00Z"
  generation: 1
  name: cluster-manager
  resourceVersion: "494"
  uid: 59d345b6-0d8d-45e3-ac62-c8e427ad3880
spec:
  addOnManagerImagePullSpec: quay.io/open-cluster-management/addon-manager:latest
  deployOption:
    mode: Default
  placementImagePullSpec: quay.io/open-cluster-management/placement:latest
  registrationConfiguration:
    featureGates:
    - feature: DefaultClusterSet
      mode: Enable
  registrationImagePullSpec: quay.io/open-cluster-management/registration:latest
  workImagePullSpec: quay.io/open-cluster-management/work:latest
status:
  conditions:
  - lastTransitionTime: "2023-07-20T09:45:00Z"
    message: Do not support StorageVersionMigration
    reason: StorageVersionMigrationFailed
    status: "False"
    type: MigrationSucceeded

logs of cluster-manager:

# kubectl logs -f -n open-cluster-management cluster-manager-5f49d9f787-r2rcq
...
E0720 09:45:42.272303       1 base_controller.go:270] "ClusterManagerController" controller failed to sync "cluster-manager", err: clustermanagers.operator.open-cluster-management.io "cluster-manager" is forbidden: User "system:serviceaccount:open-cluster-management:cluster-manager" cannot patch resource "clustermanagers" in API group "operator.open-cluster-management.io" at the cluster scope
I0720 09:46:23.029996       1 certrotation_controller.go:137] Reconciling ClusterManager "cluster-manager"
E0720 09:46:23.032189       1 base_controller.go:270] "CertRotationController" controller failed to sync "cluster-manager", err: namespace "open-cluster-management-hub" does not exist yet
E0720 09:46:23.235966       1 base_controller.go:270] "ClusterManagerController" controller failed to sync "cluster-manager", err: clustermanagers.operator.open-cluster-management.io "cluster-manager" is forbidden: User "system:serviceaccount:open-cluster-management:cluster-manager" cannot patch resource "clustermanagers" in API group "operator.open-cluster-management.io" at the cluster scope
I0720 09:47:44.952880       1 certrotation_controller.go:137] Reconciling ClusterManager "cluster-manager"
E0720 09:47:44.955275       1 base_controller.go:270] "CertRotationController" controller failed to sync "cluster-manager", err: namespace "open-cluster-management-hub" does not exist yet
E0720 09:47:45.159655       1 base_controller.go:270] "ClusterManagerController" controller failed to sync "cluster-manager", err: clustermanagers.operator.open-cluster-management.io "cluster-manager" is forbidden: User "system:serviceaccount:open-cluster-management:cluster-manager" cannot patch resource "clustermanagers" in API group "operator.open-cluster-management.io" at the cluster scope

Jian Zhu commented a year ago

/kind bug

Jian Zhu · Answer 1 · Thu Jul 20 2023 17:52:55 GMT+0800 (China Standard Time)

Not sure if it is possible to add some e2e like this into the clusteradm repo?
@ycyaoxdu WDYT?

Nir Soffer · Answer 2 · Mon Apr 01 2024 07:12:17 GMT+0800 (China Standard Time)

Another instance of this error, before the ClusterManager CR was created.

This happens also in join, we need to fix the waiting code to continue waiting after unexpected events.

      drenv.commands.Error: Command failed:
         command: ('clusteradm', 'init', '--feature-gates', 'ManagedClusterAutoApproval=true', '--bundle-version', 'default', '--wait', '--context', 'hub')
         exitcode: 1
         error:
            Preflight check: HubApiServer check Passed with 0 warnings and 0 errors
            Preflight check: cluster-info check Passed with 0 warnings and 0 errors
            Error: unexpected watch event received

$ kubectl get deploy cluster-manager -n open-cluster-management --context hub -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    kubectl.kubernetes.io/last-applied-configuration: ""
  creationTimestamp: "2024-03-31T22:56:37Z"
  generation: 1
  labels:
    app: cluster-manager
  name: cluster-manager
  namespace: open-cluster-management
  resourceVersion: "653"
  uid: 70a41fc0-18b3-451e-a4d9-4d806b155f5f
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: cluster-manager
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: cluster-manager
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - cluster-manager
              topologyKey: failure-domain.beta.kubernetes.io/zone
            weight: 70
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - cluster-manager
              topologyKey: kubernetes.io/hostname
            weight: 30
      containers:
      - args:
        - /registration-operator
        - hub
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        image: quay.io/open-cluster-management/registration-operator:v0.13.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 8443
            scheme: HTTPS
          initialDelaySeconds: 2
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: registration-operator
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 8443
            scheme: HTTPS
          initialDelaySeconds: 2
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          privileged: false
          runAsNonRoot: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /tmp
          name: tmpdir
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: cluster-manager
      serviceAccountName: cluster-manager
      terminationGracePeriodSeconds: 30
      volumes:
      - emptyDir: {}
        name: tmpdir
status:
  conditions:
  - lastTransitionTime: "2024-03-31T22:56:46Z"
    lastUpdateTime: "2024-03-31T22:56:46Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: "2024-03-31T23:06:47Z"
    lastUpdateTime: "2024-03-31T23:06:47Z"
    message: ReplicaSet "cluster-manager-9d976f8d4" has timed out progressing.
    reason: ProgressDeadlineExceeded
    status: "False"
    type: Progressing
  observedGeneration: 1
  replicas: 1
  unavailableReplicas: 1
  updatedReplicas: 1

$ kubectl get ClusterManager -n open-cluster-management --context hub
No resources found