ybai62868

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Language:PythonMIT4498 33 120

matmulfreellm

Implementation for MatMul-free LM.

Language:PythonApache-2.02896 43 31

VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Language:PythonApache-2.01872 27 121

ThunderKittens

Tile primitives for speedy kernels

Language:CudaMIT1528 24 26

Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"

Language:PythonApache-2.01304 15 147

flash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Language:PythonMIT1257 27 44

Triton-Puzzles

Puzzles for learning Triton

Language:Jupyter NotebookApache-2.01019 10 11

VITA

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

Language:PythonNOASSERTION839 38 45

Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Language:CudaApache-2.0577 6 15

depyf

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Language:PythonMIT467 8 25

fast-whisper-finetuning

Language:Jupyter Notebook439 9 15

basalt

A Machine Learning framework from scratch in Pure Mojo 🔥

Language:MojoNOASSERTION399 12 38

timeloop

Timeloop performs modeling, mapping and code-generation for tensor algebra workloads on various accelerator architectures.

Language:C++BSD-3-Clause328 21 179

zero-bubble-pipeline-parallelism

Zero Bubble Pipeline Parallelism

Language:PythonNOASSERTION264 6 26

xdsl

A Python Compiler Design Toolkit

Language:PythonNOASSERTION257 19 429

vidur

A large-scale simulation framework for LLM inference

Language:PythonMIT251 6 17

matmul.c

Fast multi-threaded matrix multiplication in C

Language:CMIT170 50

triton-viz

Language:PythonMIT133 7 9

cuda-tensorcore-hgemm

Language:Cuda101 50

ParrotServe

[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable

Language:PythonMIT99 5 4

TiledKernel

TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.

Language:C++MIT18 20

sgemm_riscv

This project records the process of optimizing SGEMM (single-precision floating point General Matrix Multiplication) on the riscv platform.

Language:CMIT1500

rvv-kernels

RISCV Vector Kernel C/LLVM-IR generator

Language:CApache-2.05 20

Ansor-AF-DS

This repository contains the figures, tables and source code in the ICS'24 paper: "Accelerated Auto-Tuning of GPU Kernels for Tensor Computations".

Language:Python500