Linpeng Tang's starred repositories
ChatGPT-Next-Web
A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT/Gemini 应用。
LLaMA-Factory
A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
tuning_playbook
A playbook for systematically maximizing the performance of deep learning models.
devika
Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective. Devika aims to be a competitive open-source alternative to Devin by Cognition AI.
RedPajama-Data
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Fengshenbang-LM
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
torchtitan
A native PyTorch Library for large model training
visualblocks
Visual Blocks for ML is a Google visual programming framework that lets you create ML pipelines in a no-code graph editor. You – and your users – can quickly prototype workflows by connecting drag-and-drop ML components, including models, user inputs, processors, and visualizations.
Awesome-LLMs-Datasets
Summarize existing representative LLMs text datasets.
PyPaperBot
PyPaperBot is a Python tool for downloading scientific papers using Google Scholar, Crossref, and SciHub.
alexandria
Full text search engine powering Alexandria.org - the open search engine.
altinity-dashboard
Altinity Dashboard helps you manage ClickHouse installations controlled by clickhouse-operator.
myscale-telemetry
Open-source observability for your LLM application.
tantivy_warc_indexer
builds a tantivy index from common crawl warc.wet files