NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NVIDIA VGPU does not work

sdake opened this issue · comments

NVIDIA Open GPU Kernel Modules Version

535.86.10

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Description: Debian GNU/Linux 12 (bookworm)

Kernel Release

Linux wise-a40x1-1 6.1.0-11-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.38-4 (2023-08-08) x86_64 GNU/Linux

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Hardware: GPU

NVIDIA A30 24G

Describe the bug

The NVIDIA VGPU software does not function with the cloud-hypervisor virtual machine monitor. An extensive analysis has been completed, and a summary has been produced.

To Reproduce

Try to use nvidia-vgpu with either a proprietary or open-source driver. In either case, the nvidia-vgpu-vgpu mdev control plane has odd expectations about the command line for the virtual machine monitor.

A very short summary is that nvidia-vgpu is hardcoded to QEMU. Most modern accelerated compute startups appear to want to use cloud-hypervisor as the technology has superior performance and quality.

Bug Incidence

Always

nvidia-bug-report.log.gz

We are all working towards the same goal. I don't have the nvidia-vgpu software at this time. I will ask the individual that filed the issue to attach nvidia-bug-report.sh output.

@dengxuehua would you be kind enough to follow this issue tracker as well as provide the results of nvidia-bug-report.sh?

More Info

No response

Issue within cloud-hypervisor: cloud-hypervisor/cloud-hypervisor#5319.

Thank you,
-steve

I confirm that this does not happen with the proprietary driver package.

Try to use nvidia-vgpu with either a proprietary or open-source driver.

These two statements can't both be true.

@ttabi A simple thank you for spending 20 minutes of my life reporting a bug against your product offerings would be in order.

Thank you.
-steve

Sorry for that, @sdake. Thanks for the report.

The open-gpu-kernel-modules do not yet support virtualization. We're currently working on it (it requires changes both in open-gpu-kernel-modules and in the GSP firmware); it may be a few releases before the support is added.

It arguably should be better called out, but the lack of virtualization support is mentioned, buried in the GPU driver README:

http://us.download.nvidia.com/XFree86/Linux-x86_64/535.98/README/kernel_open.html

Sorry for the inconvenience.

@aritger for the record, I know it's not the place, but GPU virtualization should REALLY not be locked out of consumer boards. At least let us vGPU one instance so some of us can virtualize a GPU-enabled Windows VM or something.

Thanks for your feedback. I will relay your message to the appropriate team.

@aritger Did you read my request? I will repeat it so we understand each other.

Your drivers, whether proprietary or open, lack vgpu support for any hypervisor other than QEMU. The broader accelerated computing community much prefer to use modern hypervisors, such as cloud-hypervisor. And unfortunately, the structure of the implementation of vgpu, even if paid, does not work with this hypervisor.

Thank you
-steve

I made sure the vgpu team is aware of your concern. Thank you for bringing this to our attention.