jiashu-z

followers

following

stars

Jiashu's starred repositories

SpotServe

SpotServe: Serving Generative Large Language Models on Preemptible Instances

Apache-2.07100

readerwriterqueue

A fast single-producer, single-consumer lock-free queue for C++

Language:C++NOASSERTION355600

lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Language:PythonApache-2.0185300

transformer-walkthrough

A walkthrough of transformer architecture code

Language:Jupyter NotebookMIT29100

bam

Language:CudaBSD-2-Clause10800

intel-extension-for-deepspeed

Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note XPU is already supported by stock DeepSpeed.

Language:C++MIT5400

generative-ai-for-beginners

18 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/

Language:Jupyter NotebookMIT4992800

DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

Language:C++Apache-2.0497900

dlrm_datasets

Set of datasets for the deep learning recommendation model (DLRM).

MIT3900

sc23-dl-tutorial

SC23 Deep Learning at Scale Tutorial Material

Language:Python2800

eecs598

Advanced Topics on Systems for X

ppl.llm.kernel.cuda

Language:C++Apache-2.012100

llm-analysis

Latency and Memory Analysis of Transformer Models for Training and Inference

Language:PythonApache-2.030500

punica

Serving multiple LoRA finetuned LLM as one

Language:PythonApache-2.088300

scalene

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals

Language:PythonApache-2.01133400

qdrant

Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Language:RustApache-2.01871300

llmperf

LLMPerf is a library for validating and benchmarking LLMs

Language:PythonApache-2.047000

DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Language:PythonApache-2.0175100

kineto

A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.

Language:HTMLNOASSERTION64900

how-to-optimize-gemm

row-major matmul optimization

Language:C++GPL-3.055500

minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Language:PythonMIT1936300

mindsdb

The platform for building AI from enterprise data

Language:PythonNOASSERTION2298900

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonNOASSERTION923300

megablocks

Language:PythonApache-2.0110800

CUDA_Freshman

Language:Cuda197800

ElasticFlow

Artifacts for our ASPLOS'23 paper ElasticFlow

Language:PythonApache-2.04900

gpu_poor

Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization

Language:JavaScript69900

EnvPipe

Language:Python1900

FastCkpt

Python package for rematerialization-aware gradient checkpointing

Language:PythonApache-2.02200

ssb-dbgen

Language:C10100