Beast code in Giters

InkdyeHuang's starred repositories

MaxKB

🚀 基于 LLM 大语言模型的知识库问答系统。开箱即用，支持快速嵌入到第三方业务系统，1Panel 官方出品。

Language:PythonGPL-3.0650800

text-generation-inference

Large Language Model Text Generation Inference

Language:PythonApache-2.0813700

Awesome-Chinese-LLM

整理开源的中文大语言模型，以规模较小、可私有化部署、训练成本较低的模型为主，包括底座模型，垂直领域微调及应用，数据集与教程等。

1204800

sglang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.

Language:PythonApache-2.0255100

opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Language:PythonApache-2.0287800

mamba

The Fast Cross-Platform Package Manager

Language:C++BSD-3-Clause640000

self-rag

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

Language:PythonMIT152400

punica

Serving multiple LoRA finetuned LLM as one

Language:PythonApache-2.085600

DB-GPT

AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents

Language:PythonMIT1149600

DeepKE

[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

Language:PythonMIT305700

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonApache-2.03502800

PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

Language:C++MIT703800

Megatron-LLaMA

Best practice for training LLaMA models in Megatron-LM

Language:PythonNOASSERTION55500

S-LoRA

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Language:PythonApache-2.0154300

LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Language:PythonMIT403800

AFPQ

AFPQ code implementation

Language:PythonMIT1500

intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Language:PythonApache-2.0199800

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

GPL-3.0161400

OpenSPG is a Knowledge Graph Engine developed by Ant Group in collaboration with OpenKG, based on the SPG (Semantic-enhanced Programmable Graph) framework. Core Capabilities: 1) domain model constrained knowledge modeling, 2) facts and logic fused representation, 3) kNext SDK(python): LLM-enhanced knowledge construction, reasoning and generation

Language:JavaApache-2.044800

LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

Language:PythonApache-2.0806700

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.0699500

InkdyeHuang

InkdyeHuang's starred repositories

KsanaLLM

MaxKB

text-generation-inference

Awesome-Chinese-LLM

sglang

opencompass

mamba

mamba

DecryptPrompt

self-rag

punica

alphageometry

DB-GPT

DeepKE

FastChat

PowerInfer

vllm_backend

Megatron-LLaMA

S-LoRA

LLMLingua

AFPQ

intel-extension-for-transformers

Awesome-LLM-Inference

openspg

LMFlow

TensorRT-LLM

Medusa

ChatGLM-Finetuning

LLaMA-Factory

INT8-Flash-Attention-FMHA-Quantization