InkdyeHuang's starred repositories

Language:C++License:NOASSERTIONStargazers:167Issues:0Issues:0

MaxKB

🚀 基于 LLM 大语言模型的知识库问答系统。开箱即用,支持快速嵌入到第三方业务系统,1Panel 官方出品。

Language:PythonLicense:GPL-3.0Stargazers:6508Issues:0Issues:0

text-generation-inference

Large Language Model Text Generation Inference

Language:PythonLicense:Apache-2.0Stargazers:8137Issues:0Issues:0

Awesome-Chinese-LLM

整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。

Stargazers:12048Issues:0Issues:0

sglang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.

Language:PythonLicense:Apache-2.0Stargazers:2551Issues:0Issues:0

opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Language:PythonLicense:Apache-2.0Stargazers:2878Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:10235Issues:0Issues:0

mamba

The Fast Cross-Platform Package Manager

Language:C++License:BSD-3-ClauseStargazers:6400Issues:0Issues:0

DecryptPrompt

总结Prompt&LLM论文,开源数据&模型,AIGC应用

Stargazers:2230Issues:0Issues:0

self-rag

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

Language:PythonLicense:MITStargazers:1524Issues:0Issues:0

punica

Serving multiple LoRA finetuned LLM as one

Language:PythonLicense:Apache-2.0Stargazers:856Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:3753Issues:0Issues:0

DB-GPT

AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents

Language:PythonLicense:MITStargazers:11496Issues:0Issues:0

DeepKE

[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

Language:PythonLicense:MITStargazers:3057Issues:0Issues:0

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonLicense:Apache-2.0Stargazers:35028Issues:0Issues:0

PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

Language:C++License:MITStargazers:7038Issues:0Issues:0
Language:PythonLicense:BSD-3-ClauseStargazers:124Issues:0Issues:0

Megatron-LLaMA

Best practice for training LLaMA models in Megatron-LM

Language:PythonLicense:NOASSERTIONStargazers:555Issues:0Issues:0

S-LoRA

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Language:PythonLicense:Apache-2.0Stargazers:1543Issues:0Issues:0

LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Language:PythonLicense:MITStargazers:4038Issues:0Issues:0

AFPQ

AFPQ code implementation

Language:PythonLicense:MITStargazers:15Issues:0Issues:0

intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Language:PythonLicense:Apache-2.0Stargazers:1998Issues:0Issues:0

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

License:GPL-3.0Stargazers:1614Issues:0Issues:0

openspg

OpenSPG is a Knowledge Graph Engine developed by Ant Group in collaboration with OpenKG, based on the SPG (Semantic-enhanced Programmable Graph) framework. Core Capabilities: 1) domain model constrained knowledge modeling, 2) facts and logic fused representation, 3) kNext SDK(python): LLM-enhanced knowledge construction, reasoning and generation

Language:JavaLicense:Apache-2.0Stargazers:448Issues:0Issues:0

LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

Language:PythonLicense:Apache-2.0Stargazers:8067Issues:0Issues:0

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++License:Apache-2.0Stargazers:6995Issues:0Issues:0

Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:1933Issues:0Issues:0

ChatGLM-Finetuning

基于ChatGLM-6B、ChatGLM2-6B、ChatGLM3-6B模型,进行下游具体任务微调,涉及Freeze、Lora、P-tuning、全参微调等

Language:PythonStargazers:2527Issues:0Issues:0

LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs

Language:PythonLicense:Apache-2.0Stargazers:23487Issues:0Issues:0