caq's starred repositories
algorithm_qa
左程云老师算法最优解Python实现
lanlanInterview
此仓库将包含各大银行的基本介绍,笔试面试特点,发现这个宝库就离上岸不远了,哼
ChineseEmbedding
Chinese Embedding collection incling token ,postag ,pinyin,dependency,word embedding.中文自然语言处理向量合集,包括字向量,拼音向量,词向量,词性向量,依存关系向量.共5种类型的向量
text_matching
文本匹配的相关模型DSSM,ESIM,ABCNN,BIMPM等,数据集为LCQMC官方数据
text_matching
常用文本匹配模型tf版本,数据集为QA_corpus,持续更新中
LeetcodeTop
汇总各大互联网公司容易考察的高频leetcode题🔥
Awesome-Chinese-Corpus-Datasets-and-Models
Awesome Chinese Corpus Datasets and Models.
Chinese-BERT-wwm
Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
Interview-site-Lan
高频大厂面试题+电子书+此仓库作为面试的一条龙服务,其中包含面试真题,简历模板,后端技术精髓,当然也有生活相关比如租房坑等,简直暖心的仓库
free-programming-books-zh_CN
:books: 免费的计算机编程类中文书籍,欢迎投稿
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
ArticutAPI
API of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到 SIGHAN 2005 F1-measure 94% 以上,Recall 96% 以上的成績。
covid-papers-browser
Browse Covid-19 & SARS-CoV-2 Scientific Papers with Transformers 🦠 📖
SciBERT_CN
Pretrained model for Chinese Scientific Text
Chinese-Word-Vectors
100+ Chinese Word Vectors 上百种预训练中文词向量
Computational-Journalism-for-People-s-Daily-Opinion
Statistical topic models – one of the sub fields of machine learning and natural language processing – provide a data-driven framework for analyzing collections of text documents. It has become one of the most frequently used tools for computational journalism used to investigate abstract topics and keywords that occur in a collection of text documents. Digital journalists can use such tools to extract frequently appearing terms, and to analyze the trend of a particular news brand or stories about a social event. Articles, analyses and documents written in Chinese have become increasingly important for multimedia stories about China. Available Chinese archives on the Internet might contain stories that require digital journalists to apply appropriate topic modeling tools. Unlike English and other alphabetic languages, the basic structural unit of Chinese language is character encoded in Guobiao GB18030 or Unicode. I implement apps using the Chinese topic-modeling tools jieba for computational journalism. This app analyzes articles from the opinion archive of the People’s Daily and generate a list of frequently appearing words using the keyword extraction tool provided by the jieba library.
nlp_resource
个人所需整理的自然语言处理资源集合
nlp-journey
Documents, papers and codes related to Natural Language Processing, including Topic Model, Word Embedding, Named Entity Recognition, Text Classificatin, Text Generation, Text Similarity, Machine Translation),etc.