scv119

Chen Shen's repositories

ray

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Language:PythonApache-2.02 10

awesome-tensor-compilers

A list of awesome compiler projects and papers for tensor computation and deep learning.

100

openmlsys-zh

《Machine Learning Systems: Design and Implementation》- Chinese Version

100

punica

100

CUDA-PPT

Apache-2.0000

cutlass-kernels

MIT000

FasterTransformer

Transformer related optimization, including BERT, GPT

Apache-2.0000

flash-attention

Fast and memory-efficient exact attention

BSD-3-Clause000

flashinfer

FlashInfer: Kernel Library for LLM Serving

Apache-2.0000

grouped_gemm

PyTorch bindings for CUTLASS grouped GEMM.

Apache-2.0000

how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

000

learn-rust

Language:Rust000

learning-nn

Language:Jupyter Notebook000

learning-triton

Language:Python000

lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Apache-2.0000

Lightrails

Yet another distributed training/inferencing framework.

Apache-2.0000

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Language:C++Apache-2.0000

megablocks

Apache-2.0000

Megatron-LM

Ongoing research training transformer models at scale

NOASSERTION000

mini-redis

Incomplete Redis client and server implementation using Tokio - for learning purposes only

Language:RustMIT000

nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

MIT000

og-equity-compensation

Stock options, RSUs, taxes — read the latest edition: www.holloway.com/ec

000

open_llama

Apache-2.0000

orbit

A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.

Language:PythonNOASSERTION010

r4cppp

Rust for C++ programmers

NOASSERTION000

ScaleLLM

A high-performance inference system for large language models, designed for production environments.

Apache-2.0000

scv119

000

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Apache-2.0000

The-Art-of-Linear-Algebra

Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"

CC0-1.0000

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.0000