Beast code in Giters

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化，也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

MIT341700

OpenChatKit

Language:PythonApache-2.0900300

Instructdial

Code for the paper Code for the paper InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning

Language:PythonApache-2.09600

TencentPretrain

Tencent Pre-training framework in PyTorch & Pre-trained Model Zoo

Language:PythonNOASSERTION102100

llama-docker-playground

Quick Start LLaMA models with multiple methods, and fine-tune 7B/65B with One-Click.

Language:PythonGPL-3.035000

modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

Language:PythonApache-2.0686400

ChatRWKV

ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.

Language:PythonApache-2.0939300

Fengshenbang-LM

Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系，成为中文AIGC和认知智能的基础设施。

Language:PythonApache-2.0400600

gpt-2-output-dataset

Dataset of GPT-2 outputs for research in detection, biases, and more

Language:PythonMIT193700

pretraining-with-human-feedback

Code accompanying the paper Pretraining Language Models with Human Preferences

Language:PythonMIT17500

openai-cookbook

Examples and guides for using the OpenAI API

Language:MDXMIT5884800

iPrompt

Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

Language:Python12100

minRLHF

A (somewhat) minimal library for finetuning language models with PPO on human feedback.

Language:Python8400

TextRL

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

Language:PythonMIT53900

stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Language:PythonMIT884300

GPT2

An implementation of training for GPT2, supports TPUs

Language:PythonMIT141900

transformers_tasks

⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.

Language:Jupyter Notebook212400