helfrichmichael / prusaslicer-novnc

Simple Docker container that serves Prusaslicer via noVNC in your web browser.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FR] Container GPU passthrough

fakezeta opened this issue · comments

As per our conversation on Prusa3D Forum enable 3D acceleration with GPU passthrough.

Sorry for the mega delay. Meant to look at this over the weekend, but was swamped with other stuff.

I think https://github.com/linuxserver/docker-kasm/blob/master/Dockerfile has a lot of really useful NVidia Docker bits that I plan to learn from and adapt the current Dockerfiles for my containers.

Basically I think I missed including the NVidia Toolkit package https://github.com/NVIDIA/nvidia-container-toolkit and its dependencies.

Hoping I can poke this later after work.

This is most promising. https://hub.docker.com/r/damanikjosh/virtualgl-turbovnc

Though on this topic, I am beginning to investigate alternative noVNC solutions since the one I use has been deprecated. I might also just fork it and maintain it, but we'll see.

Okay got it running.
using the default image and just doign apt install for prusa-slcier (ver 2.4)

https://hub.docker.com/r/damanikjosh/virtualgl-turbovnc

image

Btu the caveat is the to use VirtualGL, you need to run a minimal X server on your headless machine, and setup virtualgl_server but now it looks awesome.. I think the next bit of work will be to trim this down similar to what you have done in the dockerfile. figured this out we no longer need this..

Peek.2024-04-02.15-02.mp4

This is most promising. https://hub.docker.com/r/damanikjosh/virtualgl-turbovnc

Though on this topic, I am beginning to investigate alternative noVNC solutions since the one I use has been deprecated. I might also just fork it and maintain it, but we'll see.

TurboVNC / TigreVNC seems to be able to fit the bill

Peek.2024-04-02.15-02.mp4

Thanks a ton for the work on this so far! It's looking super smooth for the slicing view now. Feel free to send a pull request if you'd like and I'm happy to review and merge 😄 .

This is most promising. https://hub.docker.com/r/damanikjosh/virtualgl-turbovnc

Though on this topic, I am beginning to investigate alternative noVNC solutions since the one I use has been deprecated. I might also just fork it and maintain it, but we'll see.

TurboVNC / TigreVNC seems to be able to fit the bill

Neither of those provide a web browser package though, correct? Ideally that's something we'd probably like to retain for the repos.

Thanks again for the work on this!

Not sure if I can do a PR against your re-pro it will be all new I think.

https://github.com/damanikjosh/virtualgl-turbovnc-docker/blob/main/Dockerfile uses a base

ARG UBUNTU_VERSION=22.04

FROM nvidia/opengl:1.2-glvnd-runtime-ubuntu${UBUNTU_VERSION}

so its very bloated being based on ubuntu. however its got all the same bits, instead of openbox it uses some other DM (lbuntu), but has VNC and novnc, (my video you can see its all in a browser) I will fork this and see what I can do. but this will only work nvidia GPU's obviously.

https://gist.github.com/vajonam/d1e713bcfd47e03f27549258ef53690e <- WIP but works for the most part, I have added some of your code. I think should be able to submit a PR. Standby not too different after all, ubunut/debian still a bit bloated.

still need to add back supervisord will work on that next.

okay I have working version with supervisor etc, some fine tuning is needed for passing environment variables. look for a PR shortly. this should work regardless of nvidia, but worst case you might have 2 dockerfiles one nvidia gpu and one for cpu.

Added #15 to address this.

Added #15 to address this.

Thanks for the work so far. I just pulled the latest commit(s) and I am unable to run this via CLI (for unraid and similar I am making sure the templates match up and trying to figure out the migration path for this set of changes).

My guess is this is due to the supervisord.conf changes

2024-04-03 17:25:13 Error: Format string '/opt/TurboVNC/bin/vncserver %(ENV_DISPLAY)s -fg  %(ENV_VNC_SEC)s -depth 24 -geometry %(ENV_VNC_RESOLUTION)s' for 'program:vnc.command' contains names ('ENV_VNC_RESOLUTION') which cannot be expanded. Available names: ENV_DEBIAN_FRONTEND, ENV_DISPLAY, ENV_HOME, ENV_HOSTNAME, ENV_LC_CTYPE, ENV_LD_LIBRARY_PATH, ENV_LOCALFBPORT, ENV_NOVNC_PORT, ENV_NVIDIA_DRIVER_CAPABILITIES, ENV_NVIDIA_VISIBLE_DEVICES, ENV_PATH, ENV_PWD, ENV_SHLVL, ENV_SSL_CERT_FILE, ENV_SUPD_LOGLEVEL, ENV_VGLRUN, ENV_VGL_DISPLAY, ENV_VNC_PORT, ENV_VNC_SEC, group_name, here, host_node_name, numprocs, process_num, program_name in section 'program:vnc' (file: '/etc/supervisord.conf')
2024-04-03 17:25:13 For help, use /usr/bin/supervisord -h

Command I am running FWIW:

docker run --detach --volume=prusaslicer-novnc-data:/configs/ --volume=prusaslicer-novnc-prints:/prints/ -p 8080:8080 -e SSL_CERT_FILE="/etc/ssl/certs/ca-certificates.crt" --gpus all --name=prusaslicer-novnc prusaslicer-novnc

Playing with this a bit more on my end, but once it's ready for review, let me know and I can take a pass :).

this is the environment variables I am passing

    prusaslicer:
      # image: mikeah/prusaslicer-novnc
      image: cr.localdomain.com/prusa-new
      container_name: prusaslicer
      environment:
        - SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
        - NVIDIA_VISIBLE_DEVICES=1
        - NVIDIA_DRIVER_CAPABILITIES=all
        - VGL_DISPLAY=egl
        - SUPD_LOGLEVEL=INFO # TRACE
        - VNC_RESOLUTION=1900x1200
      volumes:
        - /opt/docker/configs/prusaslicer/config:/configs
        - /opt/docker/configs/prusaslicer/prints:/prints
      restart: unless-stopped

think you were missing VNC_RESOLUTION, It should default if not set, not sure why that is not happening will have a look. Be sure to add all the environment variables, you should be able to pass them as -e FOO=BAR

      environment:
        - SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
        - NVIDIA_VISIBLE_DEVICES=1
        - NVIDIA_DRIVER_CAPABILITIES=all
        - VGL_DISPLAY=egl
        - SUPD_LOGLEVEL=INFO # TRACE
        - VNC_RESOLUTION=1900x1200

just added an export to make sure its defaulted if not mentioned.

I am assuming that you have a nvidia GPU on your installation. I haven't tested this without. but the image is nvidia image and needs nvidia-docker2 from what I understand.

@helfrichmichael did you get it running after the export of the param, not sure why I forgot that.. anyhow. I had a couple of questions / suggestions.

  1. move to GTK3, performance seems quite good with EGL/VirtualGL accel, any reason you chose to stick with GTK2?
  2. we can include superslicer in here too, I know you have a branch. I was thinking either this can be selected by a runtime env variable, which slicer to launch

@helfrichmichael did you get it running after the export of the param, not sure why I forgot that.. anyhow. I had a couple of questions / suggestions.

  1. move to GTK3, performance seems quite good with EGL/VirtualGL accel, any reason you chose to stick with GTK2?
  2. we can include superslicer in here too, I know you have a branch. I was thinking either this can be selected by a runtime env variable, which slicer to launch

Yep, once the param was exported, it worked just fine (I also had tried passing it as a command line env prior to this FWIW).

For Supeslicer and the other slicers, I am happy to replicate this over to those once I've reviewed and merged the code unless you have capacity to update those -- no pressure either way, but this work should be a great base to work from for GPU passthrough on these apps..

Ideally for now I think keeping them separate would be ideal just to prevent having to provide migration paths for those on existing unraid templates etc (I find it's a bit nuanced to do template updates TBH).

The only other thing I am curious about is figuring out a way to allow automatic VNC resizing as this has been immensely useful for me when I go from device to device (I have a Mimo Vue touchscreen on my desktop in the garage for the printers that is fairly low res for easy presses). I haven't looked into how noVNC accomplished this, but if we can solve for either autoresizing or a static size that would be amazing.

Thanks again @vajonam , really appreciate your help and dedication on this effort.

For Supeslicer and the other slicers, I am happy to replicate this over to those once I've reviewed and merged the code unless you have capacity to update those -- no pressure either way, but this work should be a great base to work from for GPU passthrough on these apps..

Excellent.

Ideally for now I think keeping them separate would be ideal just to prevent having to provide migration paths for those on existing unraid templates etc (I find it's a bit nuanced to do template updates TBH).

Agreed.

The only other thing I am curious about is figuring out a way to allow automatic VNC resizing as this has been immensely useful for me when I go from device to device (I have a Mimo Vue touchscreen on my desktop in the garage for the printers that is fairly low res for easy presses). I haven't looked into how noVNC accomplished this, but if we can solve for either autoresizing or a static size that would be amazing.

I am not sure I understand, but what it looks like it auto resized the window, sadly the right panel on prusaslicer isn't resize able might have to move to a modern view

Thanks again @vajonam , really appreciate your help and dedication on this effort.

Yeah no problem you're welcome. For the most part this was driven by need, I had some complex files a few MB that the software rendered just couldn't do when it came to 3D. This makes it awesome! The previous solution was good for the simple stuff.

To disable VirtualGL run by setting VGLRUN= and you should see it switch back to the MESA software render and older performance. I will change the name of param to ENABLEHWGPU=ture or something like that maybe to make it more user friendly. I have been using this for a past few days do some slicing and printing works really well!

Oh wait. I'm just opening the wrong VNC file I think (we should adjust the default file for the HTTP server probably if we can).

http://localhost:8080/vnc_lite.html?resize=true seemed to render it flawlessly! I am having an issue opening the vnc.html file so I need to look at that.

I am going to try to review this after work so I can give this a stamp of approval.

This is awesome to see so far along!

Oh wait. I'm just opening the wrong VNC file I think (we should adjust the default file for the HTTP server probably if we can).

http://localhost:8080/vnc_lite.html?resize=true seemed to render it flawlessly! I am having an issue opening the vnc.html file so I need to look at that.

I am going to try to review this after work so I can give this a stamp of approval.

This is awesome to see so far along!

To account for this, I will likely make the following PR:

Dockerfile:

# Add a default file to resize, etc for noVNC.
ADD vncresize.html /usr/share/novnc/index.html

vncresize.html:

<html>
    <head>
        <script>
            window.location.replace("./vnc.html?autoconnect=true&resize=remote&reconnect=true&show_dot=true");
        </script>
    </head>
</html>

@vajonam just pushed to Docker. Successfully set it up on my unraid server with an RTX 3070. It's not picking up the GPU though it seems so I need to dive into this a bit more.

Variables I set:

NVIDIA_VISIBLE_DEVICES=all
NVIDIA_DRIVER_CAPABILITIES=all
ENABLEHWGPU=true

I'll keep poking at this when I have some time.

You might be missing some permissions on the host system, around device permissions. there is tool called vglserver_config that can help you set that up its part of the virtualgl package.

Does nvidia-smi -l show this line on the host?

|    1   N/A  N/A   2955483      G   /slic3r/slic3r-dist/bin/prusa-slicer         92MiB |

Just pulled your latest mage, and works nicely in my environment.

Does nvidia-smi -l show this line on the host?

|    1   N/A  N/A   2955483      G   /slic3r/slic3r-dist/bin/prusa-slicer         92MiB |

Sadly, no. I see "No running processes found" for all of the entities. In binhex-plexpass for example, I see the GPU passthrough just fine, etc. I can try to poke this more after work probably.

This is virtualGL passthru not GPU regular passthough which is bit different. Let me know what you find.

For full context here is my unraid variables surrounding GPU acceleration:
image

Additionally I tried running the container as privileged to no avail.

this is what I am using in my docker compose maybe you need to pass the egl

        - SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
        - NVIDIA_VISIBLE_DEVICES=1
        - NVIDIA_DRIVER_CAPABILITIES=all
        - VGL_DISPLAY=egl
        - ENABLEHWGPU=true
        - SUPD_LOGLEVEL=INFO
        - VNC_RESOLUTION=1900x1200

These are important.

        - VGL_DISPLAY=egl
        - ENABLEHWGPU=true

These are important.

        - VGL_DISPLAY=egl
        - ENABLEHWGPU=true

Confirmed VGL_DISPLAY=egl doesn't change the behavior on my end for the nvidia-smi output or the docker container.

Regarding vglserver_config are you saying I need to set this up on the host (not the docker container)?

Yes. you need it on the host to ensure the devices have the right permissions to access to the card. All it does in this case is set up permissions on the cards and make sure the user under which the docker daemon is running can access the cards.

These are important.

        - VGL_DISPLAY=egl
        - ENABLEHWGPU=true

Confirmed VGL_DISPLAY=egl doesn't change the behavior on my end for the nvidia-smi output or the docker container.

Regarding vglserver_config are you saying I need to set this up on the host (not the docker container)?

Assuming you set ENABLEHWGPU to true as well?

Yes. you need it on the host to ensure the devices have the right permissions to access to the card. All it does in this case is set up permissions on the cards and make sure the user under which the docker daemon is running can access the cards.

Hmmm that might add complexity for unraid since I can't find that as a supported approach and I believe it spins up an X server if I'm not mistaken? I'll have to look into this after work.

These are important.

        - VGL_DISPLAY=egl
        - ENABLEHWGPU=true

Confirmed VGL_DISPLAY=egl doesn't change the behavior on my end for the nvidia-smi output or the docker container.
Regarding vglserver_config are you saying I need to set this up on the host (not the docker container)?

Assuming you set ENABLEHWGPU to true as well?

Correct, I have set both of those on my template.

There is no need with an X server on the host, it just uses EGL (virutalGL) to use the card to render it on the VNC based X server.

I got some time just now to play with this a bit more and the solution to my problems wasn't enabling anything further with VirtualGL/vglserver.

In-fact it was just adding --runtime=nvidia as an "Extra Parameters" and it's working flawlessly now.

Amazing work @vajonam!

image
image

Feel free to re-open this if anyone is experiencing issues, but I believe this is good to go 🥳 .