google / xpk

xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerators such as TPUs and GPUs on GKE.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Graceful Error Message on Failing To Apply Kueue

rwitten opened this issue · comments

Customers keep reporting this error:

Error from server (Forbidden): roles.rbac.authorization.k8s.io "jobset-leader-election-role" is forbidden: User "100551137677522204023" cannot patch resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "jobset-system": requires one of ["container.roles.update"] permission(s).
Error from server (Forbidden): clusterroles.rbac.authorization.k8s.io "jobset-manager-role" is forbidden: User "100551137677522204023" cannot patch resource "clusterroles" in API group "rbac.authorization.k8s.io" at the cluster scope: requires one of ["container.clusterRoles.update"] permission(s).
Error from server (Forbidden): clusterroles.rbac.authorization.k8s.io "jobset-metrics-reader" is forbidden: User "100551137677522204023" cannot patch resource "clusterroles" in API group "rbac.authorization.k8s.io" at the cluster scope: requires one of ["container.clusterRoles.update"] permission(s).
Error from server (Forbidden): clusterroles.rbac.authorization.k8s.io "jobset-proxy-role" is forbidden: User "100551137677522204023" cannot patch resource "clusterroles" in API group "rbac.authorization.k8s.io" at the cluster scope: requires one of ["container.clusterRoles.update"] permission(s).
Error from server (Forbidden): rolebindings.rbac.authorization.k8s.io "jobset-leader-election-rolebinding" is forbidden: User "100551137677522204023" cannot patch resource "rolebindings" in API group "rbac.authorization.k8s.io" in the namespace "jobset-system": requires one of ["container.roleBindings.update"] permission(s).
Error from server (Forbidden): clusterrolebindings.rbac.authorization.k8s.io "jobset-manager-rolebinding" is forbidden: User "100551137677522204023" cannot patch resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope: requires one of ["container.clusterRoleBindings.update"] permission(s).
Error from server (Forbidden): clusterrolebindings.rbac.authorization.k8s.io "jobset-proxy-rolebinding" is forbidden: User "100551137677522204023" cannot patch resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope: requires one of ["container.clusterRoleBindings.update"] permission(s).
[XPK] Task: `Set Jobset On Cluster` terminated with code `1`
[XPK] jobset command on server side returned with ERROR returncode 1.

It would be helpful to warn them in one of our messages that they might have permission problems.

[XPK] Task: `Set Jobset On Cluster` terminated with code `1`