madehong

madehong's repositories

Seq2Seq4ATE

Codes for paper Exploring Sequence-to-Sequence Learning for Aspect Term Extraction.

Language:Python13 3 1

bert-finetune

Codes for fine-tuning Bert for kinds of tasks.

Language:Python100

ALBERT

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Language:PythonApache-2.0000

We extend CoT data to Alpaca to boost its reasoning ability. We are constantly expanding our collection of instruction-tuning data. The instruction collection can be found at https://huggingface.co/datasets/QingyiSi/Alpaca-CoT/tree/main （我们将CoT数据扩展到Alpaca以提高其推理能力，同时我们将不断收集更多的instruction-tuning数据集。）

Apache-2.0000

BELLE-prompt

BELLE: Bloom-Enhanced Large Language model Engine（开源中文对话大模型-70亿参数）

Apache-2.0000

CDial-GPT

A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models

Language:PythonMIT000

Chinese-alpaca-lora

骆驼:A Chinese finetuned instruction LLaMA. Developed by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子昂 @ 商汤科技

Apache-2.0000

cleanlab

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

AGPL-3.0000

CLUECorpus2020

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

MIT000

ERNIE2Pytorch

ERNIE Pytorch Version

MIT000

fast-bert

Super easy library for BERT based NLP models

Apache-2.0000

FLAN

Apache-2.0000

GPT2-Chinese

Chinese version of GPT2 training code, using BERT tokenizer.

MIT000

LSH_Attention

Calculate Softmax layer of Attention in O(LlogL)(L=sequence length) instead of O(L^2) using polytope Locality-Sensitive Hashing(https://arxiv.org/abs/1802.05751 ).

MIT000