clusteradm init failed
zhujian7 opened this issue · comments
use the latest clusteradm(curl -L https://raw.githubusercontent.com/open-cluster-management-io/clusteradm/main/install.sh | bash
) to init a hub cluster failed:
╰─# /usr/local/bin/clusteradm init --bundle-version='latest' --output-join-command-file join.sh --wait
Preflight check: HubApiServer check Passed with 0 warnings and 0 errors
Preflight check: cluster-info check Passed with 0 warnings and 0 errors
CRD successfully registered.
Registration operator is now available.
⠏ Waiting for cluster manager registration to become ready...
ClusterManager registration is now available.
Error: unexpected watch event received
apiVersion: operator.open-cluster-management.io/v1
kind: ClusterManager
metadata:
creationTimestamp: "2023-07-20T09:45:00Z"
generation: 1
name: cluster-manager
resourceVersion: "494"
uid: 59d345b6-0d8d-45e3-ac62-c8e427ad3880
spec:
addOnManagerImagePullSpec: quay.io/open-cluster-management/addon-manager:latest
deployOption:
mode: Default
placementImagePullSpec: quay.io/open-cluster-management/placement:latest
registrationConfiguration:
featureGates:
- feature: DefaultClusterSet
mode: Enable
registrationImagePullSpec: quay.io/open-cluster-management/registration:latest
workImagePullSpec: quay.io/open-cluster-management/work:latest
status:
conditions:
- lastTransitionTime: "2023-07-20T09:45:00Z"
message: Do not support StorageVersionMigration
reason: StorageVersionMigrationFailed
status: "False"
type: MigrationSucceeded
logs of cluster-manager:
# kubectl logs -f -n open-cluster-management cluster-manager-5f49d9f787-r2rcq
...
E0720 09:45:42.272303 1 base_controller.go:270] "ClusterManagerController" controller failed to sync "cluster-manager", err: clustermanagers.operator.open-cluster-management.io "cluster-manager" is forbidden: User "system:serviceaccount:open-cluster-management:cluster-manager" cannot patch resource "clustermanagers" in API group "operator.open-cluster-management.io" at the cluster scope
I0720 09:46:23.029996 1 certrotation_controller.go:137] Reconciling ClusterManager "cluster-manager"
E0720 09:46:23.032189 1 base_controller.go:270] "CertRotationController" controller failed to sync "cluster-manager", err: namespace "open-cluster-management-hub" does not exist yet
E0720 09:46:23.235966 1 base_controller.go:270] "ClusterManagerController" controller failed to sync "cluster-manager", err: clustermanagers.operator.open-cluster-management.io "cluster-manager" is forbidden: User "system:serviceaccount:open-cluster-management:cluster-manager" cannot patch resource "clustermanagers" in API group "operator.open-cluster-management.io" at the cluster scope
I0720 09:47:44.952880 1 certrotation_controller.go:137] Reconciling ClusterManager "cluster-manager"
E0720 09:47:44.955275 1 base_controller.go:270] "CertRotationController" controller failed to sync "cluster-manager", err: namespace "open-cluster-management-hub" does not exist yet
E0720 09:47:45.159655 1 base_controller.go:270] "ClusterManagerController" controller failed to sync "cluster-manager", err: clustermanagers.operator.open-cluster-management.io "cluster-manager" is forbidden: User "system:serviceaccount:open-cluster-management:cluster-manager" cannot patch resource "clustermanagers" in API group "operator.open-cluster-management.io" at the cluster scope
/kind bug
Another instance of this error, before the ClusterManager CR was created.
This happens also in join, we need to fix the waiting code to continue waiting after unexpected events.
drenv.commands.Error: Command failed:
command: ('clusteradm', 'init', '--feature-gates', 'ManagedClusterAutoApproval=true', '--bundle-version', 'default', '--wait', '--context', 'hub')
exitcode: 1
error:
Preflight check: HubApiServer check Passed with 0 warnings and 0 errors
Preflight check: cluster-info check Passed with 0 warnings and 0 errors
Error: unexpected watch event received
$ kubectl get deploy cluster-manager -n open-cluster-management --context hub -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
kubectl.kubernetes.io/last-applied-configuration: ""
creationTimestamp: "2024-03-31T22:56:37Z"
generation: 1
labels:
app: cluster-manager
name: cluster-manager
namespace: open-cluster-management
resourceVersion: "653"
uid: 70a41fc0-18b3-451e-a4d9-4d806b155f5f
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: cluster-manager
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: cluster-manager
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- cluster-manager
topologyKey: failure-domain.beta.kubernetes.io/zone
weight: 70
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- cluster-manager
topologyKey: kubernetes.io/hostname
weight: 30
containers:
- args:
- /registration-operator
- hub
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
image: quay.io/open-cluster-management/registration-operator:v0.13.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 8443
scheme: HTTPS
initialDelaySeconds: 2
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: registration-operator
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 8443
scheme: HTTPS
initialDelaySeconds: 2
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
requests:
cpu: 100m
memory: 128Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
runAsNonRoot: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /tmp
name: tmpdir
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: cluster-manager
serviceAccountName: cluster-manager
terminationGracePeriodSeconds: 30
volumes:
- emptyDir: {}
name: tmpdir
status:
conditions:
- lastTransitionTime: "2024-03-31T22:56:46Z"
lastUpdateTime: "2024-03-31T22:56:46Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
- lastTransitionTime: "2024-03-31T23:06:47Z"
lastUpdateTime: "2024-03-31T23:06:47Z"
message: ReplicaSet "cluster-manager-9d976f8d4" has timed out progressing.
reason: ProgressDeadlineExceeded
status: "False"
type: Progressing
observedGeneration: 1
replicas: 1
unavailableReplicas: 1
updatedReplicas: 1
$ kubectl get ClusterManager -n open-cluster-management --context hub
No resources found