Tianpeng Bu's starred repositories
Awesome-GFlowNets
A curated list of resources about generative flow networks (GFlowNets).
Awesome-LLM-KG
Awesome papers about unifying LLMs and KGs
llm_benchmarks
A collection of benchmarks and datasets for evaluating LLM.
data_management_LLM
Collection of training data management explorations for large language models
distilabel
⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.
Awesome-Knowledge-Distillation-of-LLMs
This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.
text-clustering
Easily embed, cluster and semantically label text datasets
llm-datasets
High-quality datasets, tools, and concepts for LLM fine-tuning.
llm-data-creation
Model, Code & Data for the EMNLP'23 paper "Making Large Language Models Better Data Creators"
AttrPrompt
[NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.
RedPajama-Data
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
LLMDataHub
A quick guide (especially) for trending instruction finetuning datasets
awesome-instruction-datasets
A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。
InstructUIE
Universal information extraction with instruction learning
Evaluation-of-ChatGPT-on-Information-Extraction
An Evaluation of ChatGPT on Information Extraction task, including Named Entity Recognition (NER), Relation Extraction (RE), Event Extraction (EE) and Aspect-based Sentiment Analysis (ABSA).
ChatIE
The online version is temporarily unavailable because we cannot afford the key. You can clone and run it locally. Note: we set defaul openai key. If keys exceed plan and are invalid, please tell us. The response speed depends on openai. ( sometimes, the official is too crowded and slow)
lm-evaluation-harness
A framework for few-shot evaluation of language models.
llama-recipes
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.