DUT-LiuYang

DUT-LiuYang's starred repositories

AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Language:PythonMIT167164 1553 2689

langchain

🦜🔗 Build context-aware reasoning applications

Language:Jupyter NotebookMIT92972 681 7660

Prompt-Engineering-Guide

🐙 Guides, papers, lecture, notebooks and resources for prompt engineering

Language:MDXMIT48202 536 183

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonMIT19621 302 1359

ChineseBQB

🇨🇳 Chinese sticker pack,More joy / 表情包的博物馆, Github最有毒的仓库, **表情包大集合, 聚欢乐~

Language:JavaScript12177 163 86

ML-Papers-of-the-Week

🔥Highlighting the top ML papers every week.

10004 841 4

trl

Train transformer language models with reinforcement learning.

Language:PythonApache-2.09572 74 1124

easy-rl

强化学习中文教程（蘑菇书🍄），在线阅读地址：https://datawhalechina.github.io/easy-rl/

Language:Jupyter NotebookNOASSERTION9185 79 143

LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

Language:PythonApache-2.08226 72 407

accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Language:PythonApache-2.07764 97 1582

awesome-pretrained-chinese-nlp-models

Awesome Pretrained Chinese NLP Models，高质量中文预训练模型&大模型&多模态模型&大语言模型集合

Language:PythonMIT4758 91 12

opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Language:PythonApache-2.03831 23 520

MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化，也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

MIT3414 64 53

Linly

Chinese-LLaMA 1&2、Chinese-Falcon 基础模型；ChatFlow中文对话模型；中文OpenLLaMA模型；NLP预训练/指令微调数据集

Language:Python3028 51 134

pythia

The hub for EleutherAI's work on interpretability and learning dynamics

Language:Jupyter NotebookApache-2.02227 32 105

WebGLM

WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)

Language:PythonApache-2.01557 25 70

GPT2-NewsTitle

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

Language:PythonApache-2.01094 10 42

wordninja

Probabilistically split concatenated words using NLP based on English Wikipedia unigram frequencies.

Language:PythonMIT802 10 21

MacBERT

Revisiting Pre-trained Models for Chinese Natural Language Processing (MacBERT)

Apache-2.0639 14 22

BERT-whitening-pytorch

Pytorch version of BERT-whitening

Language:PythonMIT309 1 14

leetcode-java

🎓🎓🎓 Leetcode solution in Java - 536/921 Solved. https://leetcode.com/problemset/all/

Language:Java150 90

QuRating

[ICML 2024] Selecting High-Quality Data for Training Language Models

Language:Python135 6 6

GEC-Info

Repository to collect and categorize Grammatical Error Correction papers.

112 8 3

BANG is a new pretraining model to Bridge the gap between Autoregressive (AR) and Non-autoregressive (NAR) Generation. AR and NAR generation can be uniformly regarded as to what extent previous tokens can be attended, and BANG bridges AR and NAR generation by designing a novel model structure for large-scale pretraining. The pretrained BANG model can simultaneously support AR, NAR and semi-NAR generation to meet different requirements.

Language:PythonMIT28 5 4