Beast code in Giters

piano_123's starred repositories

rwkv.cpp

INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

Language:C++MIT113200

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Language:PythonApache-2.01206700

ppl.nn.llm

14000

gaussian-splatting

Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"

Language:PythonNOASSERTION1303500

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonMIT418800

mlc-llm

Universal LLM Deployment Engine with ML Compilation

Language:PythonApache-2.01809000

Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Language:PythonApache-2.01288500

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION448000

llama.cpp

LLM inference in C/C++

Language:C++MIT6292800

stable-diffusion-webui

Stable Diffusion web UI

Language:PythonAGPL-3.013717000

triton

Development repository for the Triton language and compiler

Language:C++MIT1214600

xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Language:PythonNOASSERTION814800

x-transformers

A simple but complete full-attention transformer with a set of promising experimental features from various papers

Language:PythonMIT444800

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Language:PythonApache-2.0366000

Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Language:PythonBSD-3-Clause263000

tensorrtllm_backend

The Triton TensorRT-LLM Backend

Language:PythonApache-2.062200

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.0777900