microk8s 1.24 - enabling gpu addon fails
ACodingfreak opened this issue · comments
Summary
I was using microk8s 1.29 previously as mentioned in
#4557
Then I downgraded to Microk8s 1.24 by performing remove and clean install of microk8s 1.24.
Now on enabling GPU I see following error
mm321:~$ microk8s.enable gpu
Infer repository core for addon gpu
Enabling NVIDIA GPU
Addon core/dns is already enabled
Enabling Helm 3
Fetching helm version v3.8.0.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 12.9M 100 12.9M 0 0 9671k 0 0:00:01 0:00:01 --:--:-- 9664k
Helm 3 is enabled
Checking if NVIDIA driver is already installed
Using operator GPU driver
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /var/snap/microk8s/5872/credentials/client.config
Error: repository name (nvidia) already exists, please specify a different name
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /var/snap/microk8s/5872/credentials/client.config
NAME: gpu-operator
LAST DEPLOYED: Mon Jul 1 12:01:20 2024
NAMESPACE: gpu-operator-resources
STATUS: deployed
REVISION: 1
TEST SUITE: None
NVIDIA is enabled
It says nvidia is enabled but microk8s status says otherwise as shown below
mm321:~$ microk8s status
microk8s is running
high-availability: no
datastore master nodes: 10.10.26.231:19001
datastore standby nodes: none
addons:
enabled:
dashboard # (core) The Kubernetes dashboard
dns # (core) CoreDNS
ha-cluster # (core) Configure high availability on the current node
helm3 # (core) Helm 3 - Kubernetes package manager
hostpath-storage # (core) Storage class; allocates storage from host directory
ingress # (core) Ingress controller for external access
metallb # (core) Loadbalancer for your Kubernetes cluster
metrics-server # (core) K8s Metrics Server for API access to service metrics
storage # (core) Alias to hostpath-storage add-on, deprecated
disabled:
community # (core) The community addons repository
gpu # (core) Automatic enablement of Nvidia CUDA
helm # (core) Helm 2 - the package manager for Kubernetes
host-access # (core) Allow Pods connecting to Host services smoothly
mayastor # (core) OpenEBS MayaStor
prometheus # (core) Prometheus operator for monitoring and logging
rbac # (core) Role-Based Access Control for authorisation
registry # (core) Private image registry exposed on localhost:32000
What Should Happen Instead?
Enabling GPu addon should be successful
Reproduction Steps
- Install microk8s 1.24 in mm231 and gpu01
- Add gpu01 in cluster with mm231
- microk8s enable gpu in mm231
Introspection Report
inspection-report-20240701_121105.tar.gz
Can you suggest a fix?
No
Are you interested in contributing with a fix?
Not Sure