Beast code in Giters

Inspired by google c4, here is a series of colossal clean data cleaning scripts focused on CommonCrawl data processing. Including Chinese data processing and cleaning methods in MassiveText.

Language:Jupyter NotebookMIT300

alignment-handbook

Robust recipes to align language models with human and AI preferences

Language:PythonApache-2.0419400

train-with-fsdp

Language:PythonMIT8900

tiny-openai-whisper-api

OpenAI Whisper API-style local server, runnig on FastAPI

Language:PythonMIT4800

awesome-mixture-of-experts

A collection of AWESOME things about mixture-of-experts

83700

Shepherd

This is the repo for the paper Shepherd -- A Critic for Language Model Generation

Language:Jupyter NotebookNOASSERTION20300

Taiwan-LLM

Traditional Mandarin LLMs for Taiwan

Language:PythonApache-2.0113800

traditional_chinese_llama2

finetune llama2 with traditional chinese dataset

Language:PythonApache-2.03700

traditional-chinese-alpaca

A Traditional-Chinese instruction-following model with datasets based on Alpaca.

Language:PythonApache-2.013300

deep_learning_curriculum

Language model alignment-focused deep learning curriculum

116400

LLM-Eval

Language:PythonApache-2.01000

langchain-ask-the-doc

Ask the Doc app built using Langchain and Streamlit.

Language:Python5700

open-instruct

Language:PythonApache-2.0108300

truthfulqa_experiments

Language:Python400

show-me-chatgpt-plugin

Create and edit diagrams in ChatGPT

Language:TypeScript67400

ProgramFC

Codes for ACL 2023 Paper "Fact-Checking Complex Claims with Program-Guided Reasoning"

Language:PythonMIT2700

longeval-summarization

Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https://arxiv.org/abs/2301.13298).

Language:PythonApache-2.04100

awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)

Apache-2.0302400

LLMsPracticalGuide

A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)

900200

text-generation-inference

Large Language Model Text Generation Inference

Language:PythonApache-2.0835900

arxiv-latex-cleaner

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Language:PythonApache-2.0501500

LLMZoo

⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.⚡

Language:PythonApache-2.0290600

awesome-chatgpt-dataset

Unlock the Power of LLM: Explore These Datasets to Train Your Own ChatGPT!

Language:PythonGPL-3.067500

chatbot-ui

AI chat for every model.

Language:TypeScriptMIT2736200