Ilyas Moutawwakil's starred repositories
privateGPT
Interact with your documents using the power of GPT, 100% privately, no data leaks
llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
text-generation-inference
Large Language Model Text Generation Inference
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
GPU-Puzzles
Solve puzzles. Learn CUDA.
diffusion-models-class
Materials for the Hugging Face Diffusion Models Course
text-embeddings-inference
A blazing fast inference solution for text embeddings models
llm-vscode
LLM powered development for VSCode
cuda-python
CUDA Python Low-level Bindings
attention_sinks
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
onnxscript
ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.
scrape-open-llm-leaderboard
Scrape and export data from the Open LLM Leaderboard.