caq's starred repositories

algorithm_qa

左程云老师算法最优解Python实现

Language:PythonLicense:MITStargazers:88Issues:0Issues:0

TextMatch

基于Pytorch的,中文语义相似度匹配模型(ABCNN、Albert、Bert、BIMPM、DecomposableAttention、DistilBert、ESIM、RE2、Roberta、SiaGRU、XlNet)

Language:PythonStargazers:787Issues:0Issues:0

cacl2

Lexicon for Chinese lexical analyzing, 中文语言分词词库

Language:PythonLicense:Apache-2.0Stargazers:117Issues:0Issues:0

lanlanInterview

此仓库将包含各大银行的基本介绍,笔试面试特点,发现这个宝库就离上岸不远了,哼

Language:HTMLStargazers:1139Issues:0Issues:0

ChineseEmbedding

Chinese Embedding collection incling token ,postag ,pinyin,dependency,word embedding.中文自然语言处理向量合集,包括字向量,拼音向量,词向量,词性向量,依存关系向量.共5种类型的向量

Language:PythonStargazers:450Issues:0Issues:0

glyce

Code for NeurIPS 2019 - Glyce: Glyph-vectors for Chinese Character Representations

Language:PythonLicense:Apache-2.0Stargazers:421Issues:0Issues:0

text_matching

文本匹配的相关模型DSSM,ESIM,ABCNN,BIMPM等,数据集为LCQMC官方数据

Language:PythonStargazers:467Issues:0Issues:0

text_matching

常用文本匹配模型tf版本,数据集为QA_corpus,持续更新中

Language:PythonLicense:Apache-2.0Stargazers:674Issues:0Issues:0

LeetcodeTop

汇总各大互联网公司容易考察的高频leetcode题🔥

Stargazers:18712Issues:0Issues:0

ConSERT

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Language:PythonStargazers:539Issues:0Issues:0

Awesome-Chinese-Corpus-Datasets-and-Models

Awesome Chinese Corpus Datasets and Models.

Language:PythonStargazers:15Issues:0Issues:0

Chinese-BERT-wwm

Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)

Language:PythonLicense:Apache-2.0Stargazers:9689Issues:0Issues:0

Interview-site-Lan

高频大厂面试题+电子书+此仓库作为面试的一条龙服务,其中包含面试真题,简历模板,后端技术精髓,当然也有生活相关比如租房坑等,简直暖心的仓库

Language:HTMLStargazers:373Issues:0Issues:0

free-programming-books-zh_CN

:books: 免费的计算机编程类中文书籍,欢迎投稿

License:GPL-3.0Stargazers:111718Issues:0Issues:0

leetcode

LeetCode Solutions: A Record of My Problem Solving Journey.( leetcode题解,记录自己的leetcode解题之路。)

Language:JavaScriptLicense:NOASSERTIONStargazers:54736Issues:0Issues:0
License:Apache-2.0Stargazers:677Issues:0Issues:0

albert_zh

A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS, 海量中文预训练ALBERT模型

Language:PythonStargazers:3933Issues:0Issues:0

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonLicense:Apache-2.0Stargazers:135038Issues:0Issues:0

ArticutAPI

API of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到 SIGHAN 2005 F1-measure 94% 以上,Recall 96% 以上的成績。

Language:PythonLicense:MITStargazers:408Issues:0Issues:0

covid-papers-browser

Browse Covid-19 & SARS-CoV-2 Scientific Papers with Transformers 🦠 📖

Language:CSSLicense:GPL-2.0Stargazers:182Issues:0Issues:0

SciBERT_CN

Pretrained model for Chinese Scientific Text

License:MITStargazers:43Issues:0Issues:0

Chinese-Word-Vectors

100+ Chinese Word Vectors 上百种预训练中文词向量

Language:PythonLicense:Apache-2.0Stargazers:11837Issues:0Issues:0

nboost

NBoost is a scalable, search-api-boosting platform for deploying transformer models to improve the relevance of search results on different platforms (i.e. Elasticsearch)

Language:PythonLicense:Apache-2.0Stargazers:675Issues:0Issues:0

CS-Notes

我的自学笔记,终身更新,当前专注System基础、MLSys。

Language:PythonStargazers:3830Issues:0Issues:0

docsearch

:blue_book: The easiest way to add search to your documentation.

Language:TypeScriptLicense:MITStargazers:4014Issues:0Issues:0

ETM

Topic Modeling in Embedding Spaces

Language:PythonLicense:MITStargazers:544Issues:0Issues:0

Computational-Journalism-for-People-s-Daily-Opinion

Statistical topic models – one of the sub fields of machine learning and natural language processing – provide a data-driven framework for analyzing collections of text documents. It has become one of the most frequently used tools for computational journalism used to investigate abstract topics and keywords that occur in a collection of text documents. Digital journalists can use such tools to extract frequently appearing terms, and to analyze the trend of a particular news brand or stories about a social event. Articles, analyses and documents written in Chinese have become increasingly important for multimedia stories about China. Available Chinese archives on the Internet might contain stories that require digital journalists to apply appropriate topic modeling tools. Unlike English and other alphabetic languages, the basic structural unit of Chinese language is character encoded in Guobiao GB18030 or Unicode. I implement apps using the Chinese topic-modeling tools jieba for computational journalism. This app analyzes articles from the opinion archive of the People’s Daily and generate a list of frequently appearing words using the keyword extraction tool provided by the jieba library.

Language:Objective-CStargazers:8Issues:0Issues:0

HanLP

中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

Language:PythonLicense:Apache-2.0Stargazers:33915Issues:0Issues:0

nlp_resource

个人所需整理的自然语言处理资源集合

Stargazers:70Issues:0Issues:0

nlp-journey

Documents, papers and codes related to Natural Language Processing, including Topic Model, Word Embedding, Named Entity Recognition, Text Classificatin, Text Generation, Text Similarity, Machine Translation),etc.

License:Apache-2.0Stargazers:1609Issues:0Issues:0