Tianpeng Bu's starred repositories

gflownet

GFlowNet library specialized for graph & molecular data

Language:PythonLicense:MITStargazers:180Issues:0Issues:0

torchgfn

GFlowNet library

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:205Issues:0Issues:0

Awesome-GFlowNets

A curated list of resources about generative flow networks (GFlowNets).

License:MITStargazers:369Issues:0Issues:0

Awesome-LLM-KG

Awesome papers about unifying LLMs and KGs

Stargazers:1759Issues:0Issues:0

llm_benchmarks

A collection of benchmarks and datasets for evaluating LLM.

Stargazers:176Issues:0Issues:0

data_management_LLM

Collection of training data management explorations for large language models

Stargazers:238Issues:0Issues:0

AugCon

Automatically Generating Numerous Context-Driven SFT Data for LLMs across Diverse Granularity

Language:PythonLicense:MITStargazers:11Issues:0Issues:0

distilabel

⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.

Language:PythonLicense:Apache-2.0Stargazers:1220Issues:0Issues:0

Awesome-Knowledge-Distillation-of-LLMs

This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.

Stargazers:437Issues:0Issues:0

deepeval

The LLM Evaluation Framework

Language:PythonLicense:Apache-2.0Stargazers:2567Issues:0Issues:0

text-clustering

Easily embed, cluster and semantically label text datasets

Language:PythonLicense:Apache-2.0Stargazers:404Issues:0Issues:0

llm-datasets

High-quality datasets, tools, and concepts for LLM fine-tuning.

Stargazers:1149Issues:0Issues:0

llm-data-creation

Model, Code & Data for the EMNLP'23 paper "Making Large Language Models Better Data Creators"

Language:PythonLicense:MITStargazers:103Issues:0Issues:0

AttrPrompt

[NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.

Language:PythonLicense:Apache-2.0Stargazers:129Issues:0Issues:0

pyvene

Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions

Language:PythonLicense:Apache-2.0Stargazers:564Issues:0Issues:0
Language:RustLicense:Apache-2.0Stargazers:1064Issues:0Issues:0

RedPajama-Data

The RedPajama-Data repository contains code for preparing large datasets for training large language models.

Language:PythonLicense:Apache-2.0Stargazers:4469Issues:0Issues:0
Language:PythonStargazers:793Issues:0Issues:0

data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!

Language:PythonLicense:Apache-2.0Stargazers:1858Issues:0Issues:0

FinNLP

Democratizing Internet-scale financial data.

Language:Jupyter NotebookLicense:MITStargazers:1084Issues:0Issues:0

LLMDataHub

A quick guide (especially) for trending instruction finetuning datasets

License:MITStargazers:2301Issues:0Issues:0

awesome-instruction-datasets

A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。

License:Apache-2.0Stargazers:462Issues:0Issues:0

IEPile

[OneKE] [ACL 2024] IEPile: A Large-Scale Information Extraction Corpus

Language:PythonLicense:NOASSERTIONStargazers:139Issues:0Issues:0

InstructUIE

Universal information extraction with instruction learning

Language:PythonLicense:MITStargazers:355Issues:0Issues:0

OpenNRE

An Open-Source Package for Neural Relation Extraction (NRE)

Language:PythonLicense:MITStargazers:4289Issues:0Issues:0

Evaluation-of-ChatGPT-on-Information-Extraction

An Evaluation of ChatGPT on Information Extraction task, including Named Entity Recognition (NER), Relation Extraction (RE), Event Extraction (EE) and Aspect-based Sentiment Analysis (ABSA).

Language:PythonStargazers:120Issues:0Issues:0

KnowLM

An Open-sourced Knowledgable Large Language Model Framework.

Language:PythonLicense:MITStargazers:1152Issues:0Issues:0

ChatIE

The online version is temporarily unavailable because we cannot afford the key. You can clone and run it locally. Note: we set defaul openai key. If keys exceed plan and are invalid, please tell us. The response speed depends on openai. ( sometimes, the official is too crowded and slow)

Language:PythonLicense:NOASSERTIONStargazers:768Issues:0Issues:0

lm-evaluation-harness

A framework for few-shot evaluation of language models.

Language:PythonLicense:MITStargazers:6016Issues:0Issues:0

llama-recipes

Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.

Language:Jupyter NotebookStargazers:11072Issues:0Issues:0