jeromeku

followers

following

stars

jeromeku's repositories

triton-rs

Language:Rust7 10

accelerated-scan

Accelerated First Order Parallel Associative Scan

Language:CudaMIT000

ao

torchao: PyTorch Architecture Optimization (AO). A repository to host AO techniques and performant kernels that work with PyTorch.

Language:PythonBSD-3-Clause000

api-design

LivingSocial API Design Guide

000

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonMIT000

candle

Minimalist ML framework for Rust

Language:RustApache-2.0000

colab-connect

Connect to Google Colab VM from your local VSCode

Language:PythonMIT000

colab-test

Language:Jupyter Notebook010

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++NOASSERTION000

CutlassProgramming

Language:Cuda000

EVT_AE

Artifacts of EVT ASPLOS'24

Language:Python000

extension_builder

Language:Cuda000

FlagAttention

A collection of memory efficient attention operators implemented in the Triton language.

Language:PythonNOASSERTION000

fsdp_qlora

Training LLMs with QLoRA + FSDP

Language:Jupyter NotebookApache-2.0000

GaLore

Language:PythonApache-2.0000

GEMM_MMA

Optimize GEMM with tensorcore step by step

000

haystack

:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

Language:PythonApache-2.0000

LLM-Training-Puzzles

What would you do with 1000 H100s...

Language:Jupyter NotebookMIT000

neurips_llm_efficiency_challenge

NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day

Language:Jupyter Notebook000

packing-cat

Language:Python010

punica

Serving multiple LoRA finetuned LLM as one

Language:Cuda000

pybind_example

Language:PythonNOASSERTION010

rust-telemetry-workshop

A workshop that introduces participants to a comprehensive toolkit to detect, troubleshoot and resolve issues with Rust applications.

Language:Rust000

stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

Language:PythonMIT000

toydb

Distributed SQL database in Rust, written as a learning project

Language:RustApache-2.0000

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.0000

trident

A performance library for machine learning applications.

Language:PythonApache-2.0000

triton

Development repository for the Triton language and compiler

Language:C++MIT000

triton-aot

Language:C++MIT010

unsloth

5X faster 60% less memory QLoRA finetuning

Language:PythonApache-2.0000