FlyingFlame (flyinglandlord)

flyinglandlord

Geek Repo

Company:SJTU, SenseTime

Location:Shanghai

Home Page:flyinglaird.top

Github PK Tool:Github PK Tool

FlyingFlame's starred repositories

awesome-model-quantization

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

Stargazers:1826Issues:0Issues:0

accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Language:PythonLicense:Apache-2.0Stargazers:7770Issues:0Issues:0

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++License:Apache-2.0Stargazers:8327Issues:0Issues:0

latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Language:Jupyter NotebookLicense:MITStargazers:11593Issues:0Issues:0

DenoisingDiffusionProbabilityModel-ddpm-

This may be the simplest implement of DDPM. You can directly run Main.py to train the UNet on CIFAR-10 dataset and see the amazing process of denoising.

Language:PythonLicense:MITStargazers:1507Issues:0Issues:0

LLM-Viewer

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Language:PythonLicense:MITStargazers:283Issues:0Issues:0

llmc

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Language:PythonLicense:Apache-2.0Stargazers:240Issues:0Issues:0

KVQuant

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Language:PythonStargazers:284Issues:0Issues:0

ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.

Language:C++License:MITStargazers:2818Issues:0Issues:0

EasyLLM

Built upon Megatron-Deepspeed and HuggingFace Trainer, EasyLLM has reorganized the code logic with a focus on usability. While enhancing usability, it also ensures training efficiency.

Language:PythonLicense:Apache-2.0Stargazers:39Issues:0Issues:0

S-LoRA

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Language:PythonLicense:Apache-2.0Stargazers:1716Issues:0Issues:0

lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Language:PythonLicense:Apache-2.0Stargazers:2413Issues:0Issues:0

awesome-lm-system

Summary of system papers/frameworks/codes/tools on training or serving large model

License:Apache-2.0Stargazers:56Issues:0Issues:0

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

License:GPL-3.0Stargazers:2581Issues:0Issues:0

text-generation-inference

Large Language Model Text Generation Inference

Language:PythonLicense:Apache-2.0Stargazers:8856Issues:0Issues:0

mlc-llm

Universal LLM Deployment Engine with ML Compilation

Language:PythonLicense:Apache-2.0Stargazers:18799Issues:0Issues:0

FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

Language:C++License:Apache-2.0Stargazers:1668Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:27810Issues:0Issues:0

PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

Language:C++License:MITStargazers:7901Issues:0Issues:0

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonLicense:Apache-2.0Stargazers:132916Issues:0Issues:0

BUAA-CT-2022

北航编译技术2022

Language:JavaStargazers:15Issues:0Issues:0
Language:PythonLicense:GPL-3.0Stargazers:603Issues:0Issues:0

LogicStack-LeetCode

公众号「宫水三叶的刷题日记」刷穿 LeetCode 系列文章源码

License:Apache-2.0Stargazers:7310Issues:0Issues:0

Universe

哔哩哔哩流媒体平台功能增强系列模块

License:Apache-2.0Stargazers:883Issues:0Issues:0
Language:PythonLicense:GPL-3.0Stargazers:1766Issues:0Issues:0

boat4study-release-page

学舟平台发布页

Stargazers:1Issues:0Issues:0

iRingo

解锁完整的 Apple功能和集成服务

Language:JavaScriptLicense:GPL-3.0Stargazers:9351Issues:0Issues:0

boat4study_frontend

学舟平台前端——自适应Web-App和微信小程序

Language:VueStargazers:4Issues:0Issues:0

barracuda-frontend

Database Group Homework

Language:VueStargazers:1Issues:0Issues:0

playerdemo

一个视频播放器,开源版 potplayer ,用于总结播放器开发技术。

License:GPL-3.0Stargazers:1Issues:0Issues:0