LittleQili

followers

following

stars

Shanghai Jiao Tong University

Shanghai, China

Organizations

SJTU-CSE

Yijia Diao's starred repositories

orion

An interference-aware scheduler for fine-grained GPU sharing

Language:PythonMIT9000

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Language:PythonApache-2.0413700

nccl-tests

NCCL Tests

Language:CudaBSD-3-Clause81400

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonApache-2.02156700

FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

Language:C++Apache-2.0164200

llm-analysis

Latency and Memory Analysis of Transformer Models for Training and Inference

Language:PythonApache-2.033600

llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Language:PythonMIT232200

Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")

Language:C++NOASSERTION24600

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.02626300

dlrover

DLRover: An Automatic Distributed Deep Learning System

Language:PythonNOASSERTION119400

distrifuser

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Language:PythonMIT55100

GPU-scheduler-for-deep-learning

GPU-scheduler-for-deep-learning

Language:C++MIT19200

bamboo

Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.

Language:PythonMIT4600

TGS

Artifacts for our NSDI'23 paper TGS

Language:PythonApache-2.06300

gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.

Language:C++Apache-2.0589900

gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Language:PythonApache-2.0523700

gemma

Open weights LLM from Google DeepMind.

Language:PythonApache-2.0238200

multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Language:CudaBSD-3-Clause52500

PipeSwitch

PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications

Language:PythonApache-2.012400

glet

Language:C++4000

mig-parted

MIG Partition Editor for NVIDIA GPUs

Language:GoApache-2.016200

MIGProfiler

Multi-Instance-GPU profiling tool

Language:Jupyter NotebookMIT5100

gdev

First-Class GPU Resource Management: Device Drivers, Runtimes, and CUDA Compilers for Nouveau.

Language:CMIT34300

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaApache-2.0112400

gdev

First-Class GPU Resource Management: Device Drivers, Runtimes, and CUDA Compilers for Nouveau.

Language:CMIT4400

slurm

Slurm: A Highly Scalable Workload Manager

Language:CNOASSERTION257200

TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

Language:C++Apache-2.01053400

pygmtools

A Python Graph Matching Toolkit.

Language:PythonNOASSERTION28900

nvidia-container-toolkit

Build and run containers leveraging NVIDIA GPUs

Language:GoApache-2.0219400

hidet

An open-source efficient deep learning framework/compiler, written in python.

Language:PythonApache-2.064500