caqnwpu

followers

following

stars

caq's starred repositories

algorithm_qa

左程云老师算法最优解Python实现

Language:PythonMIT8800

TextMatch

基于Pytorch的，中文语义相似度匹配模型（ABCNN、Albert、Bert、BIMPM、DecomposableAttention、DistilBert、ESIM、RE2、Roberta、SiaGRU、XlNet）

Language:Python78700

cacl2

Lexicon for Chinese lexical analyzing, 中文语言分词词库

Language:PythonApache-2.011700

lanlanInterview

此仓库将包含各大银行的基本介绍，笔试面试特点，发现这个宝库就离上岸不远了，哼

Language:HTML113900

ChineseEmbedding

Chinese Embedding collection incling token ,postag ,pinyin,dependency,word embedding.中文自然语言处理向量合集,包括字向量,拼音向量,词向量,词性向量,依存关系向量.共5种类型的向量

Language:Python45000

glyce

Code for NeurIPS 2019 - Glyce: Glyph-vectors for Chinese Character Representations

Language:PythonApache-2.042100

text_matching

文本匹配的相关模型DSSM,ESIM,ABCNN,BIMPM等，数据集为LCQMC官方数据

Language:Python46700

text_matching

常用文本匹配模型tf版本，数据集为QA_corpus，持续更新中

Language:PythonApache-2.067400

LeetcodeTop

汇总各大互联网公司容易考察的高频leetcode题🔥

ConSERT

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Language:Python53900

Awesome-Chinese-Corpus-Datasets-and-Models

Awesome Chinese Corpus Datasets and Models.

Language:Python1500

Chinese-BERT-wwm

Pre-Training with Whole Word Masking for Chinese BERT（中文BERT-wwm系列模型）

Language:PythonApache-2.0968900

Interview-site-Lan

高频大厂面试题+电子书+此仓库作为面试的一条龙服务，其中包含面试真题，简历模板，后端技术精髓，当然也有生活相关比如租房坑等，简直暖心的仓库

Language:HTML37300

free-programming-books-zh_CN

:books: 免费的计算机编程类中文书籍，欢迎投稿

GPL-3.011171800

leetcode

LeetCode Solutions: A Record of My Problem Solving Journey.( leetcode题解，记录自己的leetcode解题之路。)

Language:JavaScriptNOASSERTION5473600

FinBERT

Apache-2.067700

albert_zh

A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS, 海量中文预训练ALBERT模型

Language:Python393300

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.013503800

ArticutAPI

API of Articut 中文斷詞 (兼具語意詞性標記)：「斷詞」又稱「分詞」，是中文資訊處理的基礎。Articut 不用機器學習，不需資料模型，只用現代白話中文語法規則，即能達到 SIGHAN 2005 F1-measure 94% 以上，Recall 96% 以上的成績。

Language:PythonMIT40800

covid-papers-browser

Browse Covid-19 & SARS-CoV-2 Scientific Papers with Transformers 🦠 📖

Language:CSSGPL-2.018200

SciBERT_CN

Pretrained model for Chinese Scientific Text

MIT4300

Chinese-Word-Vectors

100+ Chinese Word Vectors 上百种预训练中文词向量

Language:PythonApache-2.01183700

nboost

NBoost is a scalable, search-api-boosting platform for deploying transformer models to improve the relevance of search results on different platforms (i.e. Elasticsearch)

Language:PythonApache-2.067500

CS-Notes

我的自学笔记，终身更新，当前专注System基础、MLSys。

Language:Python383000

docsearch

:blue_book: The easiest way to add search to your documentation.

Language:TypeScriptMIT401400

ETM

Topic Modeling in Embedding Spaces

Language:PythonMIT54400

Computational-Journalism-for-People-s-Daily-Opinion

Statistical topic models – one of the sub fields of machine learning and natural language processing – provide a data-driven framework for analyzing collections of text documents. It has become one of the most frequently used tools for computational journalism used to investigate abstract topics and keywords that occur in a collection of text documents. Digital journalists can use such tools to extract frequently appearing terms, and to analyze the trend of a particular news brand or stories about a social event. Articles, analyses and documents written in Chinese have become increasingly important for multimedia stories about China. Available Chinese archives on the Internet might contain stories that require digital journalists to apply appropriate topic modeling tools. Unlike English and other alphabetic languages, the basic structural unit of Chinese language is character encoded in Guobiao GB18030 or Unicode. I implement apps using the Chinese topic-modeling tools jieba for computational journalism. This app analyzes articles from the opinion archive of the People’s Daily and generate a list of frequently appearing words using the keyword extraction tool provided by the jieba library.

Language:Objective-C800

HanLP

中文分词词性标注命名实体识别依存句法分析成分句法分析语义依存分析语义角色标注指代消解风格转换语义相似度新词发现关键词短语提取自动摘要文本分类聚类拼音简繁转换自然语言处理

Language:PythonApache-2.03391500

nlp_resource

个人所需整理的自然语言处理资源集合

7000

nlp-journey

Documents, papers and codes related to Natural Language Processing, including Topic Model, Word Embedding, Named Entity Recognition, Text Classificatin, Text Generation, Text Similarity, Machine Translation)，etc.

Apache-2.0160900