Ilyas Moutawwakil's starred repositories
gpu-benches
collection of benchmarks to measure basic GPU capabilities
llm-perf-backend
The backend behind the LLM-Perf Leaderboard
optimum-amd
AMD related optimizations for transformer models
scrape-open-llm-leaderboard
Scrape and export data from the Open LLM Leaderboard.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
GPU-Puzzles
Solve puzzles. Learn CUDA.
text-embeddings-inference
A blazing fast inference solution for text embeddings models
optimum-quanto
A pytorch quantization backend for optimum
attention_sinks
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
llm-vscode
LLM powered development for VSCode
diffusion-models-class
Materials for the Hugging Face Diffusion Models Course
cuda-python
CUDA Python Low-level Bindings
onnxscript
ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.
llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
private-gpt
Interact with your documents using the power of GPT, 100% privately, no data leaks