google / gvisor

Application Kernel for Containers

Home Page:https://gvisor.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to checkpoint container with `-nvproxy` after the introduction of `driverABI`

luiscape opened this issue · comments

Description

Similar to #9363, the driverABI struct doesn't implement SaverLoader.

I applied a similar patch to #9385 and am able to checkpoint containers with -nvproxy successfully (still testing restore; patch below). I'm happy to submit a PR but wondering if this makes sense and what are the implications of not saving this state.

The patch would be made here.

// +stateify savable
type driverABI struct {
	frontendIoctl   map[uint32]frontendIoctlHandler   `state:"nosave"`
	uvmIoctl        map[uint32]uvmIoctlHandler        `state:"nosave"`
	controlCmd      map[uint32]controlCmdHandler      `state:"nosave"`
	allocationClass map[uint32]allocationClassHandler `state:"nosave"`

	useRmAllocParamsV535 bool
}

Does this make sense?

This makes sense. The driver ABI should be savable. Happy to review your PR.

Although this would imply that the container must be restored on a host with the same nvidia driver version. If the driver version can change, then the ABI would need to be rebuilt (which requires extra work).

Although this would imply that the container must be restored on a host with the same nvidia driver version.

Gotcha. This is true in our case (for the most part :) ).

Submitted the patch here. Thanks a lot for the review.