jeromeku's repositories
accelerated-scan
Accelerated First Order Parallel Associative Scan
ao
torchao: PyTorch Architecture Optimization (AO). A repository to host AO techniques and performant kernels that work with PyTorch.
api-design
LivingSocial API Design Guide
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
candle
Minimalist ML framework for Rust
colab-connect
Connect to Google Colab VM from your local VSCode
cutlass
CUDA Templates for Linear Algebra Subroutines
EVT_AE
Artifacts of EVT ASPLOS'24
FlagAttention
A collection of memory efficient attention operators implemented in the Triton language.
fsdp_qlora
Training LLMs with QLoRA + FSDP
GEMM_MMA
Optimize GEMM with tensorcore step by step
haystack
:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
LLM-Training-Puzzles
What would you do with 1000 H100s...
neurips_llm_efficiency_challenge
NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
punica
Serving multiple LoRA finetuned LLM as one
rust-telemetry-workshop
A workshop that introduces participants to a comprehensive toolkit to detect, troubleshoot and resolve issues with Rust applications.
stable-fast
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
toydb
Distributed SQL database in Rust, written as a learning project
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
trident
A performance library for machine learning applications.
triton
Development repository for the Triton language and compiler
unsloth
5X faster 60% less memory QLoRA finetuning