jak4 / vllm-ci

CI scripts designed to build a Pascal-compatible version of vLLM.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

vllm-ci

CI scripts designed to build a Pascal-compatible version of vLLM and Triton.

Installation

Note: this repository holds "nightly" builds of vLLM, which may have the same vLLM version between releases in this repository, but have different source code. Despite the fact that they are "nightly", they are generally stable.

Note: the vllm command is an alias for the python3 -m vllm.entrypoints.openai.api_server command.

Note: kernels for all GPUs except Pascal have been excluded to reduce build time and wheel size. You can still use the new GPUs using tensor parallelism with Ray (and using two instances of vLLM, one of which will use upstream vLLM). Complain in issues if it disrupts your workflow.

To install the patched vLLM (the patched triton will be installed automatically):

# Create virtual environment
python3 -m venv venv

# Activate virtual environment
source venv/bin/activate

# Install vLLM
pip3 install --extra-index-url https://sasha0552.github.io/vllm-ci/ vllm

# Launch vLLM
vllm --help

To update a patched vLLM between same vLLM release versions (e.g. 0.5.0 (commit 000000) -> 0.5.0 (commit ffffff))

# Activate virtual environment
source venv/bin/activate

# Update vLLM
pip3 install --force-reinstall --extra-index-url https://sasha0552.github.io/vllm-ci/ --no-cache-dir --no-deps --upgrade vllm

To install aphrodite-engine with the patched triton:

# Create virtual environment
python3 -m venv venv

# Activate virtual environment
source venv/bin/activate

# Install aphrodite-engine
pip3 install --extra-index-url https://sasha0552.github.io/vllm-ci/ --extra-index-url https://downloads.pygmalion.chat/whl aphrodite-engine

# Launch aphrodite-engine
aphrodite --help

In other words, add --extra-index-url https://sasha0552.github.io/vllm-ci/ to the original installation command.

To install the patched triton separately, for use in other applications (for example, Stable Diffusion WebUIs):

Install application that published on PyPI and depends on triton:

# Install triton
pip3 install --extra-index-url https://sasha0552.github.io/vllm-ci/ <PACKAGE NAME>

Install triton before installing application:

# Install triton
pip3 install --extra-index-url https://sasha0552.github.io/vllm-ci/ triton

If application is already installed:

# Install triton
pip3 install --extra-index-url https://sasha0552.github.io/vllm-ci/ --force-reinstall triton

Don't forget to activate the virtual environment (if necessary) before performing actions!

About

CI scripts designed to build a Pascal-compatible version of vLLM.

License:MIT License


Languages

Language:Python 100.0%