camenduru / non-profit-gpu-cluster

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

🐣 Please follow me for new updates https://twitter.com/camenduru
🔥 Please join our discord server https://discord.gg/k5BwmmvJJU
🥳 Please join my patreon community https://patreon.com/camenduru

Motivation

A non-profit GPU cluster that runs open-source paper demos with a UI for free for everyone.

https://twitter.com/camenduru/status/1747802652182737050 image

  • If each person receives 24 hours of compute time every week with a 3090 or A5000 GPU 7 people can use it, with 2xGPU 14 people, with 24xGPU 168 people ...
  • Operation cost (electricity): 2xGPU 3090 or A5000 24 Hours ~$2
  • End-of-the-year goal: 6 servers with a total of 24 x A5000 or 3090 GPUs.

First Server Parts

  • ✔ GPU1: Asus ROG Strix RTX3090 O24G (3-slot 24GB with Liquid Cooler 2-slot)
  • ✔ GPU2: Asus ROG Strix RTX3090 O24G-W (3-slot 24GB with Liquid Cooler 2-slot)
  • ✔ Motherboard: Pro WS C621-64L SAGE (4 GPU Support)
  • ✔ CPU: Intel® Xeon® W-3235 Processor (64 Lane PCIe 3.0) (4 GPU Support)
  • ✔ CPU Cooler: 4U Active CPU Heat Sink LGA3647 (Narrow)
  • ✔ Ram: 1 x 32GB Micron 32GB DDR4-3200 RDIMM 1Rx4 CL22
  • ✔ Ram: 1 x 32GB Micron 32GB DDR4-3200 RDIMM 1Rx4 CL22
  • ✔ SSD: Lexar NM790 4TB M.2 NVMe PCIe Gen 4X4 7400-6500 Mb/s
  • ✔ Power supply: Corsair AX1500i 1500 Watt 80+ Titanium
  • ✔ Case: Antec P20C-W (E-ATX)

Second Server Parts

  • GPU1: 3090 OR A5000
  • GPU2: 3090 OR A5000
  • Motherboard: Pro WS C621-64L SAGE (4 GPU Support)
  • ✔ CPU: Intel® Xeon® Silver 4110 Processor (48 Lane PCIe 3.0) (3 GPU Support)
  • CPU Cooler: 4U Active CPU Heat Sink LGA3647 (Narrow)
  • Ram: 1 x 32GB DDR4 1rx4 2933MHz or 3200MHz
  • Ram: 1 x 32GB DDR4 1rx4 2933MHz or 3200MHz
  • SSD: 4TB 6Gb/s
  • ✔ Power supply: Corsair AX1500i 1500 Watt 80+ Titanium
  • ✔ Case: Antec P20C-W (E-ATX)

Budget & Sponsors

Updates

January 31, 2024

✔ CPU1: Intel® Xeon® W-3235 Processor (64 Lane PCIe 3.0) (4 GPU Support) $197
✔ CPU2: Intel® Xeon® Silver 4110 Processor (48 Lane PCIe 3.0) (3 GPU Support) $33
✔ RAM1: 1 x 32GB (Micron 32GB DDR4-3200 RDIMM 1Rx4 CL22) $95
✔ SSD1: Lexar NM790 4TB M.2 NVMe PCIe Gen 4X4 7400-6500 Mb/s $271

January 27, 2024

✔ Power Supply2: Corsair AX1500i 1500 Watt 80+ Titanium $227
✔ Case2: (Antec P20C-W) $89

January 24, 2024

✔ GPU2: Asus ROG Strix RTX3090 O24G-W (3-slot 24GB with Liquid Cooler 2-slot) $790

January 23, 2024

✔ 🧿 The non-profit GPU cluster is now running instantid.github.io 🥳 (operating with an old motherboard and CPU because our new CPU has not arrived yet)

January 22, 2024

✔ 🧿 The non-profit GPU cluster is now running photo-maker.github.io 🥳 (operating with an old motherboard and CPU because our new CPU has not arrived yet)

January 20 2024

✔ GPU1: Asus ROG Strix RTX3090 O24G (3-slot 24GB with Liquid Cooler 2-slot) $747
✔ Power Supply1: Corsair AX1500i 1500 Watt 80+ Titanium $249
✔ CPU Cooler1: 4U Active CPU Heat Sink LGA3647 (Narrow) $78

January 19 2024

✔ Motherboard1: (Pro WS C621-64L SAGE) $628
✔ RAM1: 1 x 32GB (Micron 32GB DDR4-3200 RDIMM 1Rx4 CL22) $95
✔ Case1: (Antec P20C-W) $89
mb_one_ram_case

Setup

SSH root login

sudo nano /etc/ssh/sshd_config
PermitRootLogin prohibit-password to PermitRootLogin yes
sudo systemctl restart ssh
sudo passwd
sudo ufw allow ssh

Ubuntu 22.04.3 LTS

apt update
apt upgrade -y
apt install build-essential software-properties-common zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libreadline-dev libffi-dev -y
apt install wget nvtop python-is-python3 python3-pip aria2 unrar -y

Cuda 12.1.0_530.30.02

lsmod | grep nouveau
cat <<EOF | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
EOF
sudo update-initramfs -u
sudo reboot

wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run
sh cuda_12.1.0_530.30.02_linux.run
nvidia-smi
nano /etc/ld.so.conf
ldconfig

nano .bashrc
ldconfig
nvcc --version

nvidia-smi -q | grep -i bar -A 3
https://www.nvidia.com/en-us/geforce/news/geforce-rtx-30-series-resizable-bar-support/
https://forums.developer.nvidia.com/t/enabling-resizable-bar-on-rtx-30-series-gpus-in-linux/239950

Python 3.10.12

pip install torch==2.1.0+cu121 torchvision==0.16.0+cu121 torchaudio==2.1.0+cu121 torchtext==0.16.0 torchdata==0.7.0 --extra-index-url https://download.pytorch.org/whl/cu121
pip install notebook
pip show torch notebook

Network

nano /etc/netplan/00-installer-config.yaml
netplan apply

wget https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
dpkg -i cloudflared-linux-amd64.deb
cloudflared service install TOKEN_HERE

Docker

https://docs.docker.com/engine/install/ubuntu/
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-the-nvidia-container-toolkit
systemctl restart docker
https://github.com/jupyter/docker-stacks

https://github.com/camenduru/docker-stacks-foundation
https://github.com/camenduru/base-notebook
https://quay.io/repository/camenduru/docker-stacks-foundation
https://quay.io/repository/camenduru/base-notebook
docker build -t base-notebook .

Ubuntu 22.04 Python 3.10.11
timeout 4h docker container run -it --rm --gpus all -u root -e GRANT_SUDO=yes -p 1000:7860 quay.io/camenduru/base-notebook:latest
timeout 4h docker container run -it --rm --gpus device=0 -u root -e GRANT_SUDO=yes -p 1000:7860 registry.hf.space/camenduru-base-notebook:latest
timeout 4h docker container run -it --rm --gpus device=1 -u root -e GRANT_SUDO=yes -p 1000:7860 camenduru/base-notebook:latest
docker system prune -a

docker cp /content/test.rar 76e35c4a6e8f:/home/jovyan/test.rar
docker exec -it 76e35c4a6e8f bash

Other

tmux ls
tmux a
tmux attach-session -t 0
tmux capture-pane -pS - > ~/tmux-buffer.txt

wget https://openrgb.org/releases/release_0.9/openrgb_0.9_amd64_bookworm_b5f46e3.deb
dpkg -i openrgb_0.9_amd64_bookworm_b5f46e3.deb
apt --fix-broken install
openrgb -m off

find / -type f -exec du -h {} + | sort -rh | head -n 20

git clone https://github.com/aristocratos/btop
cd btop
make
make install PREFIX=/usr

https://github.com/raboof/nethogs

Services

Jupyter

https://github.com/jupyterlab/jupyterlab

mkdir /content
nano /etc/systemd/system/jupyter-lab.service
systemctl daemon-reload
systemctl start jupyter-lab
systemctl enable jupyter-lab
systemctl list-unit-files --type=service --state=enabled
pip install pickleshare ipywidgets

OpenVSCode

https://github.com/coder/code-server

curl -fsSL https://code-server.dev/install.sh | sh
nano /etc/systemd/system/default.target.wants/code-server@root.service
systemctl enable --now code-server@$USER
systemctl status code-server@$USER
systemctl enable code-server@$USER
systemctl list-unit-files --type=service --state=enabled

nano /root/.config/code-server/config.yaml
bind-addr: 0.0.0.0:8080
auth: none
disable-telemetry: true
cert: false

chrome://flags/#unsafely-treat-insecure-origin-as-secure

Stable Diffusion WebUI

https://github.com/AUTOMATIC1111/stable-diffusion-webui

pip install virtualenv
cd /content
virtualenv stable-diffusion-webui-venv
. /content/stable-diffusion-webui-venv/bin/activate
deactivate

mkdir /content
nano /etc/systemd/system/stable-diffusion-webui.service
systemctl daemon-reload
systemctl start stable-diffusion-webui
systemctl stop stable-diffusion-webui
systemctl enable stable-diffusion-webui
systemctl disable stable-diffusion-webui
systemctl list-unit-files --type=service --state=enabled

Stable Cascade

https://github.com/Stability-AI/StableCascade

pip install virtualenv
cd /content
virtualenv stable-cascade-venv
. /content/stable-cascade-venv/bin/activate
deactivate

mkdir /content
nano /etc/systemd/system/stable-cascade.service
systemctl daemon-reload
systemctl start stable-cascade
systemctl stop stable-cascade
systemctl enable stable-cascade
systemctl disable stable-cascade
systemctl list-unit-files --type=service --state=enabled

Instant ID

https://github.com/InstantID/InstantID

pip install virtualenv
cd /content
virtualenv instant-id-venv
. /content/instant-id-venv/bin/activate
deactivate

mkdir /content
nano /etc/systemd/system/instant-id.service
systemctl daemon-reload
systemctl start instant-id
systemctl stop instant-id
systemctl enable instant-id
systemctl disable instant-id
systemctl list-unit-files --type=service --state=enabled

Forge

https://github.com/lllyasviel/stable-diffusion-webui-forge

pip install virtualenv
cd /content
virtualenv forge-venv
. /content/forge-venv/bin/activate
deactivate

mkdir /content
nano /etc/systemd/system/forge.service
systemctl daemon-reload
systemctl start forge
systemctl stop forge
systemctl enable forge
systemctl disable forge
systemctl list-unit-files --type=service --state=enabled

Dust3r

https://github.com/naver/dust3r

pip install virtualenv
cd /content
virtualenv dust3r-venv
. /content/dust3r-venv/bin/activate
deactivate

mkdir /content
nano /etc/systemd/system/dust3r.service
systemctl daemon-reload
systemctl start dust3r
systemctl stop dust3r
systemctl enable dust3r
systemctl disable dust3r
systemctl list-unit-files --type=service --state=enabled

TripoSR

https://github.com/VAST-AI-Research/TripoSR

pip install virtualenv
cd /content
virtualenv TripoSR-venv
. /content/TripoSR-venv/bin/activate
deactivate

mkdir /content
nano /etc/systemd/system/triposr.service
systemctl daemon-reload
systemctl start triposr
systemctl stop triposr
systemctl enable triposr
systemctl disable triposr
systemctl list-unit-files --type=service --state=enabled

CRM

https://github.com/thu-ml/CRM

pip install virtualenv
cd /content
virtualenv crm-venv
. /content/crm-venv/bin/activate
deactivate

mkdir /content
nano /etc/systemd/system/crm.service
systemctl daemon-reload
systemctl start crm
systemctl stop crm
systemctl enable crm
systemctl disable crm
systemctl list-unit-files --type=service --state=enabled

VisualStylePrompting

https://github.com/naver-ai/Visual-Style-Prompting

pip install virtualenv
cd /content
virtualenv VisualStylePrompting-venv
. /content/VisualStylePrompting-venv/bin/activate
deactivate

mkdir /content
nano /etc/systemd/system/VisualStylePrompting.service
systemctl daemon-reload
systemctl start VisualStylePrompting
systemctl stop VisualStylePrompting
systemctl enable VisualStylePrompting
systemctl disable VisualStylePrompting
systemctl list-unit-files --type=service --state=enabled

About


Languages

Language:Shell 66.8%Language:Python 33.2%