Simiao Zhang's starred repositories
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Megatron-LM
Ongoing research training transformer models at scale
cs249r_book
Collaborative book Machine Learning Systems
LLMs_interview_notes
该仓库主要记录 大模型(LLMs) 算法工程师相关的面试题
flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
awesome-llm-powered-agent
Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...
LLMAgentPapers
Must-read Papers on LLM Agents.
LLM-Agents-Papers
A repo lists papers related to LLM based agent
llama3-from-scratch
llama3 implementation one matrix multiplication at a time
tortoise-tts
A multi-voice TTS system trained with an emphasis on quality
CrossViVit
This repository contains code for the paper "Improving day-ahead Solar Irradiance Time Series Forecasting by Leveraging Spatio-Temporal Context"
AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
torch-discounted-cumsum
Fast Discounted Cumulative Sums in PyTorch
flash-attention
Fast and memory-efficient exact attention
vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
reformer-pytorch
Reformer, the efficient Transformer, in Pytorch
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
performer-pytorch
An implementation of Performer, a linear attention-based transformer, in Pytorch
long-range-arena
Long Range Arena for Benchmarking Efficient Transformers
google-research
Google Research