Beast code in Giters

zzp_miracle's starred repositories

vattention

Dynamic Memory Management for Serving LLMs without PagedAttention

Language:CMIT16100

ScaleLLM

A high-performance inference system for large language models, designed for production environments.

Language:C++Apache-2.034600

DistServe

Disaggregated serving system for Large Language Models (LLMs).

Language:Jupyter NotebookApache-2.022800

llumnix

Efficient and easy multi-instance LLM serving

Language:PythonApache-2.08200

Spec-Bench

Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)

Language:PythonApache-2.013200

DARC

Decentralized Autonomous Regulated Company (DARC), a company virtual machine that runs on any EVM-compatible blockchain, with on-chain law system, multi-level tokens and dividends mechanism.

Language:TypeScriptNOASSERTION940500

FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

Language:C++Apache-2.0161400

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.013038800

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaApache-2.095600

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Language:PythonApache-2.0380100

Modern-CPP-Programming

Modern C++ Programming Course (C++03/11/14/17/20/23/26)

Language:HTML1159700

rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

Language:C++Apache-2.047500

how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Language:Cuda131800

DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Language:PythonApache-2.0181000

ChatGLM3

ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型

Language:PythonApache-2.01323500

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.0787000

recom

An Optimizing Compiler for Recommendation Model Inference

Language:C++Apache-2.02100

sd-webui-EasyPhoto

📷 EasyPhoto | Your Smart AI Photo Generator.

Language:PythonApache-2.0485400

diffusers-api

Language:PythonMIT3300

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.02453400

hiq

HiQ - Observability And Optimization In Modern AI Era

Language:PythonNOASSERTION7000

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonApache-2.01119800

NeMo-Framework-Launcher

Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.

Language:PythonApache-2.043300

YesPlayMusic

高颜值的第三方网易云播放器，支持 Windows / macOS / Linux :electron:

Language:VueMIT2807500

BMTrain

Efficient Training (including pre-training and fine-tuning) for Big Models

Language:PythonApache-2.054100

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookApache-2.01447300

the-algorithm

Source code for Twitter's Recommendation Algorithm

Language:ScalaAGPL-3.06186500

ChatGPT-Next-Web

A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT/Gemini 应用。

Language:TypeScriptMIT7380400

Pake

🤱🏻 Turn any webpage into a desktop app with Rust. 🤱🏻 利用 Rust 轻松构建轻量级多端桌面应用

Language:RustMIT2510600

zzpmiracle