Fubang ZHAO's starred repositories

llms_paper

该仓库主要记录 LLMs 算法工程师相关的顶会论文研读笔记(多模态、PEFT、小样本QA问答、RAG、LMMs可解释性、Agents、CoT)

Stargazers:213Issues:0Issues:0

TigerBot

TigerBot: A multi-language multi-task LLM

Language:PythonLicense:Apache-2.0Stargazers:2226Issues:0Issues:0

List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words

List of Dirty, Naughty, Obscene, and Otherwise Bad Words

License:CC-BY-4.0Stargazers:2852Issues:0Issues:0

Awesome-Chinese-LLM

整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。

Stargazers:13813Issues:0Issues:0

MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

License:MITStargazers:3261Issues:0Issues:0

InternLM

Official release of InternLM2.5 7B base and chat models. 1M context support

Language:PythonLicense:Apache-2.0Stargazers:5906Issues:0Issues:0

xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Language:PythonLicense:Apache-2.0Stargazers:3468Issues:0Issues:0

nanotron

Minimalistic large language model 3D-parallelism training

Language:PythonLicense:Apache-2.0Stargazers:1014Issues:0Issues:0

JARVIS

JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf

Language:PythonLicense:MITStargazers:23422Issues:0Issues:0

ChatGLM-Tuning

基于ChatGLM-6B + LoRA的Fintune方案

Language:PythonLicense:MITStargazers:3713Issues:0Issues:0
License:NOASSERTIONStargazers:445Issues:0Issues:0

ColossalAI

Making large AI models cheaper, faster and more accessible

Language:PythonLicense:Apache-2.0Stargazers:38420Issues:0Issues:0

docGPT

ChatGPT directly within Google Docs as an Editor Add-on 📑

Language:JavaScriptStargazers:659Issues:0Issues:0

Instruction-Tuning-Papers

Reading list of Instruction-tuning. A trend starts from Natrural-Instruction (ACL 2022), FLAN (ICLR 2022) and T0 (ICLR 2022).

Stargazers:742Issues:0Issues:0

torchscale

Foundation Architecture for (M)LLMs

Language:PythonLicense:MITStargazers:2979Issues:0Issues:0

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonLicense:NOASSERTIONStargazers:9539Issues:0Issues:0

llama-dl

High-speed download of LLaMA, Facebook's 65B parameter GPT model

Language:ShellLicense:GPL-3.0Stargazers:4167Issues:0Issues:0

ChatGPT

Reverse engineered ChatGPT API

Language:PythonLicense:GPL-2.0Stargazers:27990Issues:0Issues:0

GPT4IE

An open-source and powerful Information Extraction toolkit based on GPT (GPT for Information Extraction; GPT4IE for short)。Note: we set a default openai key in the tool, you can tell us if the key reach the limit.

Language:JavaScriptLicense:MITStargazers:167Issues:0Issues:0

PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.

Language:PythonLicense:Apache-2.0Stargazers:11803Issues:0Issues:0

nlpcda

一键中文数据增强包 ; NLP数据增强、bert数据增强、EDA:pip install nlpcda

Language:PythonLicense:Apache-2.0Stargazers:1729Issues:0Issues:0

aliyun-odps-python-sdk

ODPS Python SDK and data analysis framework

Language:PythonLicense:Apache-2.0Stargazers:435Issues:0Issues:0

DocumentLayoutAnalysis

Document Layout Analysis resources repos for development with PdfPig.

Language:C#Stargazers:563Issues:0Issues:0

CasRel

A Novel Cascade Binary Tagging Framework for Relational Triple Extraction. Accepted by ACL 2020.

Language:PythonLicense:MITStargazers:754Issues:0Issues:0

datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Language:PythonLicense:Apache-2.0Stargazers:18818Issues:0Issues:0

BERT-NER

Pytorch-Named-Entity-Recognition-with-BERT

Language:PythonLicense:AGPL-3.0Stargazers:1194Issues:0Issues:0

OpenAttack

An Open-Source Package for Textual Adversarial Attack.

Language:PythonLicense:MITStargazers:665Issues:0Issues:0

TextAttack

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/

Language:PythonLicense:MITStargazers:2850Issues:0Issues:0

DeepIE

DeepIE: Deep Learning for Information Extraction

Language:PythonStargazers:1931Issues:0Issues:0

cleanlab

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

Language:PythonLicense:AGPL-3.0Stargazers:9180Issues:0Issues:0