Weijie Liu's starred repositories
human-eval
Code for the paper "Evaluating Large Language Models Trained on Code"
Awesome-Code-LLM
A curated list of language modeling researches for code and related datasets.
Chinese-instruction-datasets
中文 Instruction tuning datasets
Luotuo-Chinese-LLM
骆驼(Luotuo): Open Sourced Chinese Language Models. Developed by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子昂 @ 商汤科技
Chinese-LangChain
中文langchain项目|小必应,Q.Talk,强聊,QiangTalk
CLUECorpus2020
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
TencentPretrain
Tencent Pre-training framework in PyTorch & Pre-trained Model Zoo
BERT-whitening-pytorch
Pytorch version of BERT-whitening
Pytorch-Chinese-MultilLabel-Classification
knowledge distillation using bert for NLP tasks.
ChineseSemanticKB
ChineseSemanticKB,chinese semantic knowledge base, 面向中文处理的12类、百万规模的语义常用词典,包括34万抽象语义库、34万反义语义库、43万同义语义库等,可支持句子扩展、转写、事件抽象与泛化等多种应用场景。
ChineseTextualInference
ChineseTextualInference project including chinese corpus build and inferecence model, 中文文本推断项目,包括88万文本蕴含中文文本蕴含数据集的翻译与构建,基于深度学习的文本蕴含判定模型构建.
Financial-Knowledge-Graphs
小型金融知识图谱构建流程(neo4j / python / cypher / KG)
MSMARCO-Passage-Ranking
MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, and passage ranking. A variant of this task will be the part of TREC and AFIRM 2019. For Updates about TREC 2019 please follow This Repository Passage Reranking task Task Given a query q and a the 1000 most relevant passages P = p1, p2, p3,... p1000, as retrieved by BM25 a succeful system is expected to rerank the most relevant passage as high as possible. For this task not all 1000 relevant items have a human labeled relevant passage. Evaluation will be done using MRR
nlp_chinese_corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
CLUEDatasetSearch
搜索所有中文NLP数据集,附常用英文NLP数据集
ArticlePairMatching
The code of ACL 2019 paper: Matching Article Pairs with Graphical Decomposition and Convolutions
awesome_Chinese_medical_NLP
中文医学NLP公开资源整理:术语集/语料库/词向量/预训练模型/知识图谱/命名实体识别/QA/信息抽取/模型/论文/etc