Michael Goin (mgoin)

mgoin

Geek Repo

Company:@neuralmagic

Location:Boston

Home Page:https://www.linkedin.com/in/michael-goin/

Twitter:@mgoin_

Github PK Tool:Github PK Tool


Organizations
neuralmagic

Michael Goin's starred repositories

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:14208Issues:119Issues:1116

shadPS4

PS4 emulator for Windows,Linux,MacOS

Language:C++License:GPL-2.0Stargazers:10924Issues:128Issues:525

xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Language:PythonLicense:NOASSERTIONStargazers:8645Issues:76Issues:556

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++License:NOASSERTIONStargazers:5659Issues:109Issues:1130

Liger-Kernel

Efficient Triton Kernels for LLM Training

Language:PythonLicense:BSD-2-ClauseStargazers:3424Issues:39Issues:98

ThunderKittens

Tile primitives for speedy kernels

Language:CudaLicense:MITStargazers:1652Issues:29Issues:27

ao

PyTorch native quantization and sparsity for training and inference

Language:PythonLicense:BSD-3-ClauseStargazers:1566Issues:40Issues:293

evalplus

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

Language:PythonLicense:Apache-2.0Stargazers:1246Issues:8Issues:186

dune

A shell🐚 by the beach🏖️!

Language:RustLicense:MITStargazers:1021Issues:15Issues:44

llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Language:PythonLicense:Apache-2.0Stargazers:675Issues:12Issues:90

Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Language:CudaLicense:Apache-2.0Stargazers:633Issues:7Issues:21

mirage

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

Language:C++License:Apache-2.0Stargazers:627Issues:13Issues:51

TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.

Language:PythonLicense:NOASSERTIONStargazers:557Issues:14Issues:99

depyf

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Language:PythonLicense:MITStargazers:498Issues:8Issues:27

composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

Language:C++License:NOASSERTIONStargazers:310Issues:25Issues:226

Minitron

A family of compressed models obtained via pruning and knowledge distillation

flute

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

Language:C++License:Apache-2.0Stargazers:185Issues:5Issues:9

gemlite

Simple and fast low-bit matmul kernels in CUDA / Triton

Language:PythonLicense:Apache-2.0Stargazers:140Issues:7Issues:4

GPTQModel

Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Language:PythonLicense:Apache-2.0Stargazers:121Issues:3Issues:66
Language:PythonLicense:MITStargazers:95Issues:4Issues:9

cold-compress

Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.

Language:PythonLicense:BSD-3-ClauseStargazers:87Issues:8Issues:20
Language:PythonLicense:Apache-2.0Stargazers:79Issues:10Issues:2

TensorRT-Incubator

Experimental projects related to TensorRT

Sparse-Marlin

Boosting 4-bit inference kernels with 2:4 Sparsity

Language:CudaLicense:Apache-2.0Stargazers:51Issues:6Issues:1

compressed-tensors

A safetensors extension to efficiently store sparse quantized tensors on disk

Language:PythonLicense:Apache-2.0Stargazers:48Issues:10Issues:5

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:43Issues:4Issues:0

torch_cgx

Pytorch distributed backend extension with compression support

Language:C++License:AGPL-3.0Stargazers:16Issues:4Issues:5

SPP

[ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models

Language:Jupyter NotebookLicense:MITStargazers:16Issues:1Issues:0
Language:PythonStargazers:14Issues:2Issues:0