Yangshenâš¡Deng's starred repositories
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
DALLE2-pytorch
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
github-markdown-toc
Easy TOC creation for GitHub README.md
DeepSeek-V2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
LLMAgentPapers
Must-read Papers on LLM Agents.
Video-ChatGPT
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
flashinfer
FlashInfer: Kernel Library for LLM Serving
ringattention
Transformers with Arbitrarily Large Context
Awesome-LLM-Long-Context-Modeling
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
sarathi-serve
A low-latency & high-throughput serving engine for LLMs
SwiftTransformer
High performance Transformer implementation in C++.
FastLanesGPU
Accelerating GPU Data Processing using FastLanes Compression
GPUDB-Prefetch
Source code of our DaMoN@SIGMOD 2024 paper "How Does Software Prefetching Work on GPU Query Processing?"