Weijie Liu (autoliuweijie)

autoliuweijie

Geek Repo

Company:Peking University & Tencent Research

Location:Beijing

Home Page:www.weijieliu.com

Github PK Tool:Github PK Tool


Organizations
DataModelingGroup

Weijie Liu's starred repositories

human-eval

Code for the paper "Evaluating Large Language Models Trained on Code"

Language:PythonLicense:MITStargazers:2182Issues:0Issues:0

Awesome-Code-LLM

A curated list of language modeling researches for code and related datasets.

Stargazers:1123Issues:0Issues:0

tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Language:PythonLicense:MITStargazers:11282Issues:0Issues:0

Chinese-instruction-datasets

中文 Instruction tuning datasets

Stargazers:104Issues:0Issues:0
Language:PythonStargazers:505Issues:0Issues:0

DRL

Deep Reinforcement Learning

License:NOASSERTIONStargazers:3088Issues:0Issues:0

llama2.c

Inference Llama 2 in one file of pure C

Language:CLicense:MITStargazers:16920Issues:0Issues:0

Luotuo-Chinese-LLM

骆驼(Luotuo): Open Sourced Chinese Language Models. Developed by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子昂 @ 商汤科技

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:3627Issues:0Issues:0

WebGLM

WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)

Language:PythonLicense:Apache-2.0Stargazers:1540Issues:0Issues:0

Linly

Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集

Language:PythonStargazers:3017Issues:0Issues:0

Chinese-LangChain

中文langchain项目|小必应,Q.Talk,强聊,QiangTalk

Language:PythonStargazers:2649Issues:0Issues:0

CLUECorpus2020

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

License:MITStargazers:901Issues:0Issues:0

TencentPretrain

Tencent Pre-training framework in PyTorch & Pre-trained Model Zoo

Language:PythonLicense:NOASSERTIONStargazers:1004Issues:0Issues:0

CSL

[COLING 2022] CSL: A Large-scale Chinese Scientific Literature Dataset 中文科学文献数据集

Language:PythonStargazers:548Issues:0Issues:0

AnyQ

FAQ-based Question Answering System

Language:C++License:Apache-2.0Stargazers:2576Issues:0Issues:0

BERT-whitening-pytorch

Pytorch version of BERT-whitening

Language:PythonLicense:MITStargazers:307Issues:0Issues:0

Multi-CPR

[SIGIR 2022] Multi-CPR: A Multi Domain Chinese Dataset for Passage Retrieval

Language:PythonStargazers:159Issues:0Issues:0

Pytorch-Chinese-MultilLabel-Classification

knowledge distillation using bert for NLP tasks.

Language:PythonStargazers:6Issues:0Issues:0

ChineseSemanticKB

ChineseSemanticKB,chinese semantic knowledge base, 面向中文处理的12类、百万规模的语义常用词典,包括34万抽象语义库、34万反义语义库、43万同义语义库等,可支持句子扩展、转写、事件抽象与泛化等多种应用场景。

Language:PythonStargazers:726Issues:0Issues:0

ChineseTextualInference

ChineseTextualInference project including chinese corpus build and inferecence model, 中文文本推断项目,包括88万文本蕴含中文文本蕴含数据集的翻译与构建,基于深度学习的文本蕴含判定模型构建.

Language:PythonStargazers:161Issues:0Issues:0

ViLT

Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

Language:PythonLicense:Apache-2.0Stargazers:1339Issues:0Issues:0

Financial-Knowledge-Graphs

小型金融知识图谱构建流程(neo4j / python / cypher / KG)

Language:Jupyter NotebookStargazers:2653Issues:0Issues:0

MSMARCO-Passage-Ranking

MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, and passage ranking. A variant of this task will be the part of TREC and AFIRM 2019. For Updates about TREC 2019 please follow This Repository Passage Reranking task Task Given a query q and a the 1000 most relevant passages P = p1, p2, p3,... p1000, as retrieved by BM25 a succeful system is expected to rerank the most relevant passage as high as possible. For this task not all 1000 relevant items have a human labeled relevant passage. Evaluation will be done using MRR

Language:Jupyter NotebookLicense:MITStargazers:288Issues:0Issues:0

nlp_chinese_corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

License:MITStargazers:9314Issues:0Issues:0

SentEval

A python tool for evaluating the quality of sentence embeddings.

Language:PythonLicense:NOASSERTIONStargazers:2071Issues:0Issues:0

CLUEDatasetSearch

搜索所有中文NLP数据集,附常用英文NLP数据集

Language:PythonStargazers:4027Issues:0Issues:0

Luban

An easy-to-use 3-in-1 software tailor-made for Snapmaker machines.

Language:JavaScriptLicense:AGPL-3.0Stargazers:433Issues:0Issues:0

ArticlePairMatching

The code of ACL 2019 paper: Matching Article Pairs with Graphical Decomposition and Convolutions

Language:PythonLicense:NOASSERTIONStargazers:234Issues:0Issues:0

awesome_Chinese_medical_NLP

中文医学NLP公开资源整理:术语集/语料库/词向量/预训练模型/知识图谱/命名实体识别/QA/信息抽取/模型/论文/etc

Stargazers:2061Issues:0Issues:0

OpenKE

An Open-Source Package for Knowledge Embedding (KE)

Language:PythonStargazers:3767Issues:0Issues:0