zzp_miracle's starred repositories
vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
Spec-Bench
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
flashinfer
FlashInfer: Kernel Library for LLM Serving
Modern-CPP-Programming
Modern C++ Programming Course (C++03/11/14/17/20/23/26)
how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
sd-webui-EasyPhoto
📷 EasyPhoto | Your Smart AI Photo Generator.
NeMo-Framework-Launcher
Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.
YesPlayMusic
高颜值的第三方网易云播放器,支持 Windows / macOS / Linux :electron:
Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
the-algorithm
Source code for Twitter's Recommendation Algorithm
ChatGPT-Next-Web
A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT/Gemini 应用。