Zakor Gyula's repositories
composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
AMDMIGraphX
AMD's graph optimization engine.
flash-attention
Fast and memory-efficient exact attention
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
client
Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
onnxruntime-inference-examples
Examples for using ONNX Runtime for machine learning inferencing.
third_party
Third-party source packages that are modified for use in Triton.
core
The core library and APIs implementing the Triton Inference Server.
onnxruntime_backend
The Triton backend for the ONNX Runtime.
backend
Common source, scripts and utilities for creating Triton backends.
models
A collection of pre-trained, state-of-the-art models in the ONNX format