We're hiring! If you are interested in working on Triton at OpenAI, we have roles open for Compiler Engineers and Kernel Engineers.
Documentation |
---|
This is the development repository of Triton, a language and compiler for writing highly efficient custom Deep-Learning primitives. The aim of Triton is to provide an open-source environment to write fast code at higher productivity than CUDA, but also with higher flexibility than other existing DSLs.
The foundations of this project are described in the following MAPL2019 publication: Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations. Please consider citing this work if you use Triton!
The official documentation contains installation instructions and tutorials.
You can install the latest stable release of Triton from pip:
pip install triton
Binary wheels are available for CPython 3.7-3.11 and PyPy 3.8-3.9.
And the latest nightly release:
pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightly
git clone https://github.com/openai/triton.git;
cd triton;
python -m venv .venv --prompt triton;
source .venv/bin/activate;
pip install ninja cmake wheel; # build-time dependencies
pip install -e python
pip install pyelftools tensorboard tqdm gymnasium
https://github.com/hgl71964/CuAssembler.git
export PYTHONPATH={path-to-CuAssembler}:{path-to-CuAssembler/bin}:{path-to-CuAssembler/CuAsm}
-
build triton; in setup, ensure the ptxas, cuobjdump etc version matches the host version
-
build pytorch; note that pytorch also needs to match the version
-
pip uninstall triton (install by pytorch); and re-build
-
Set
TRITON_BUILD_WITH_CLANG_LLD=true
as an environment variable to use clang and lld. lld in particular results in faster builds. -
Set
TRITON_BUILD_WITH_CCACHE=true
to build with ccache. -
Pass
--no-build-isolation
topip install
to make nop builds faster. Without this, every invocation ofpip install
uses a different symlink to cmake, and this forces ninja to rebuild most of the.a
files. -
vscode intellisense has some difficulty figuring out how to build Triton's C++ (probably because, in our build, users don't invoke cmake directly, but instead use setup.py). Teach vscode how to compile Triton as follows.
- Do a local build.
- Get the full path to the
compile_commands.json
file produced by the build:find python/build -name 'compile_commands.json | xargs readlink -f'
- In vscode, install the
C/C++
extension,
then open the command palette (
Shift + Command + P
on Mac, orShift + Ctrl + P
on Windows/Linux) and openC/C++: Edit Configurations (UI)
. - Open "Advanced Settings" and paste the full path to
compile_commands.json
into the "Compile Commands" textbox.
Supported Platforms:
- Linux
Supported Hardware:
- NVIDIA GPUs (Compute Capability 7.0+)
- Under development: AMD GPUs, CPUs