LittleQili

Yijia Diao's starred repositories

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.0131831 1117 15657

llm.c

LLM training in simple, raw C/CUDA

Language:CudaMIT23217 226 132

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonApache-2.015762 104 1016

LoRA

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Language:PythonMIT10286 66 105

server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Language:PythonBSD-3-Clause8020 139 3698

tiny-gpu

A minimal GPU design in Verilog to learn how GPUs work from the ground up

Language:SystemVerilog6910 68 22

corenet

CoreNet: A library for training deep neural networks

Language:PythonNOASSERTION6909 63 20

miniforge

A conda-forge distribution.

Language:ShellNOASSERTION6132 55 362

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Language:PythonApache-2.04134 35 1335

Liger-Kernel

Efficient Triton Kernels for LLM Training

Language:PythonBSD-2-Clause2857 33 56

ThunderKittens

Tile primitives for speedy kernels

Language:CudaMIT1479 25 22

gpushare-scheduler-extender

GPU Sharing Scheduler for Kubernetes Cluster

Language:GoApache-2.01387 39 149

AzurePublicDataset

Microsoft Azure Traces

Language:Jupyter NotebookCC-BY-4.0781 37 35

Overleaf-Workshop

Open Overleaf/ShareLaTex projects in vscode, with full collaboration support.

Language:TypeScriptAGPL-3.0443 3 93

BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Language:PythonMIT332 13 53

DistServe

Disaggregated serving system for Large Language Models (LLMs).

Language:Jupyter NotebookApache-2.0272 4 37

rccl

ROCm Communication Collectives Library (RCCL)

Language:C++NOASSERTION248 32 90

varuna

Language:Python232 8 11

flux

A fast communication-overlapping library for tensor parallelism on GPUs.

Language:C++Apache-2.0177 7 21

Caffeine

Caffeine for macOS 11+

Language:SwiftMIT138 10

TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

Language:C++MIT101 4 53

SpotServe

SpotServe: Serving Generative Large Language Models on Preemptible Instances

Apache-2.091 2 3

orion

An interference-aware scheduler for fine-grained GPU sharing

Language:PythonMIT89 2 17

paella

Paella: Low-latency Model Serving with Virtualized GPU Scheduling

Language:C++55 4 1

pyjuice

Scalable training and inference for Probabilistic Circuits

Language:PythonApache-2.044 5 4

rccl-tests

RCCL Performance Benchmark Tests

Language:CudaNOASSERTION41 10 8

TiledKernel

TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.

Language:C++MIT16 20

amanda

Language:PythonNOASSERTION14 30

chipgptft

Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework (DAC 2024)

Language:Python900

RAP-artifacts

Language:C++2 10