jeromeku's repositories

Language:RustStargazers:9Issues:1Issues:0

accelerated-scan

Accelerated First Order Parallel Associative Scan

Language:CudaLicense:MITStargazers:0Issues:0Issues:0

ao

torchao: PyTorch Architecture Optimization (AO). A repository to host AO techniques and performant kernels that work with PyTorch.

Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

api-design

LivingSocial API Design Guide

Stargazers:0Issues:0Issues:0

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

candle

Minimalist ML framework for Rust

Language:RustLicense:Apache-2.0Stargazers:0Issues:0Issues:0

colab-connect

Connect to Google Colab VM from your local VSCode

Language:PythonLicense:MITStargazers:0Issues:0Issues:0
Language:Jupyter NotebookStargazers:0Issues:1Issues:0

cookbook-dev

Deep learning for dummies. All the practical details and useful utilities that go into working with real models.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++License:NOASSERTIONStargazers:0Issues:0Issues:0
Language:CudaStargazers:0Issues:0Issues:0

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

License:Apache-2.0Stargazers:0Issues:0Issues:0

EVT_AE

Artifacts of EVT ASPLOS'24

Language:PythonStargazers:0Issues:0Issues:0
Language:CudaStargazers:0Issues:0Issues:0

FlagAttention

A collection of memory efficient attention operators implemented in the Triton language.

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

fsdp_qlora

Training LLMs with QLoRA + FSDP

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

GEMM_MMA

Optimize GEMM with tensorcore step by step

Stargazers:0Issues:0Issues:0

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

long-context-attention

USP: Hybrid Sequence Parallel Attention for Long Context Transformers Model Training and Inference

License:Apache-2.0Stargazers:0Issues:0Issues:0

punica

Serving multiple LoRA finetuned LLM as one

Language:CudaStargazers:0Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:0Issues:1Issues:0

sc23-dl-tutorial

SC23 Deep Learning at Scale Tutorial Material

Language:PythonStargazers:0Issues:0Issues:0

stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

torchtune

A Native-PyTorch Library for LLM Fine-tuning

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

triton

Development repository for the Triton language and compiler

Language:C++License:MITStargazers:0Issues:0Issues:0
Language:C++License:MITStargazers:0Issues:1Issues:0

unsloth

5X faster 60% less memory QLoRA finetuning

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0