Chunlin Wang's starred repositories
Alpaca-CoT
We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts to initiate any meaningful PR on this repo and integrate as many LLM related technologies as possible. 我们打造了方便研究人员上手和使用大模型等微调平台,我们欢迎开源爱好者发起任何有意义的pr!
CLUEDatasetSearch
搜索所有中文NLP数据集,附常用英文NLP数据集
LLMDataHub
A quick guide (especially) for trending instruction finetuning datasets
Explore-Instruct
EMNLP'2023: Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration
awesome-synthetic-datasets
awesome synthetic (text) datasets
data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
llm-data-creation
Model, Code & Data for the EMNLP'23 paper "Making Large Language Models Better Data Creators"
llama3-from-scratch
llama3 implementation one matrix multiplication at a time
sft_datasets
开源SFT数据集整理,随时补充
LLaMA-Factory
A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
Awesome-Chinese-LLM
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
ChineseNlpCorpus
搜集、整理、发布 中文 自然语言处理 语料/数据集,与 有志之士 共同 促进 中文 自然语言处理 的 发展。
AI-For-Beginners
12 Weeks, 24 Lessons, AI for All!
generative-ai-for-beginners
18 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
AISuperDomain
Aila(AI超元域): The premier AI integration tool for Windows, macOS, and Android. Ask once, get answers from 10+ AIs like ChatGPT, Gemini, Claude3, Copilot, Poe, perplexity and more. Features customizable AI and prompts.
DeepSeek-V2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
playground
Play with neural networks!
Awesome-Causal-RL
A curated list of causal reinforcement learning resources.