KunWangR

followers

following

stars

KunWangR's starred repositories

nlp_xiaojiang

自然语言处理（nlp），小姜机器人（闲聊检索式chatbot），BERT句向量-相似度（Sentence Similarity），XLNET句向量-相似度（text xlnet embedding），文本分类（Text classification），实体提取（ner，bert+bilstm+crf），数据增强（text augment, data enhance），同义句同义词生成，句子主干提取（mainpart），中文汉语短文本相似度，文本特征工程，keras-http-service调用

Language:PythonMIT151500

rank_bm25

A Collection of BM25 Algorithms in Python

Language:PythonApache-2.091800

search_server

一个键树做的中文|拼音搜索词服务

Language:Python100

SearchTrie

字典树（实现简单的前缀匹配）

Language:JavaMIT700

pinyin4py

汉字转拼音

Language:Python4200

Pinyin2Hanzi

拼音转汉字，拼音输入法引擎， pin yin -> 拼音

Language:Python58000

MAX-Chinese-Phonetic-Similarity-Estimator

Estimate the phonetic distance between Chinese words and get similar sounding candidate words.

Language:PythonApache-2.03400

faiss

A library for efficient similarity search and clustering of dense vectors.

Language:C++MIT2943800

pretrained-models

Open Language Pre-trained Model Zoo

Apache-2.098400

SentenceSimilarity

The enhanced RCNN model used for sentence similarity classification

Language:Python4300

fastNLP

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

Language:PythonApache-2.0304600

TextBrewer

A PyTorch-based knowledge distillation toolkit for natural language processing

Language:PythonApache-2.0156800

awesome-bert

bert nlp papers, applications and github resources, including the newst xlnet ， BERT、XLNet 相关论文和 github 项目

Keyword-BERT

Language:Python27800

text2vec

text2vec, text to vector. 文本向量表征工具，把文本转化为向量矩阵，实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型，开箱即用。

Language:PythonApache-2.0425900

wsdm_cup_2020_solution

First place solution of WSDM CUP 2020, pairwise-bert, lightgbm

Language:Python8900

SequentialEventExtration

Sequential Event Experiment based on Travel note crawled from XieCheng，基于50W携程出行游记的采集与顺承事件图谱构建．

Language:Python17400

Pinyin2Chinese

Self complemented Pinyin2Chinese demo use algorithms including Trie and HMM model , 基于隐马尔科夫模型与Trie树的拼音切分与拼音转中文的简单demo实现。

Language:Python8200

MPyWE

Morpheme, Pinyin Enhanced Word Embedding

Language:Python300

pinyin2hanzi

End-to-end translation of Chinese phonetics to characters using bi-directional RNN (LSTM/GRU)

Language:Python2700

CLUEPretrainedModels

高质量中文预训练模型集合：最先进大模型、最快小模型、相似度专门模型

Language:Python79300

Task-Oriented-Dialogue-Research-Progress-Survey

A datasets and methods survey about task-oriented dialogue, including recent datasets and SOTA leaderboards.

Chatbot_CN

基于金融-司法领域(兼有闲聊性质)的聊天机器人，其中的主要模块有信息抽取、NLU、NLG、知识图谱等，并且利用Django整合了前端展示,目前已经封装了nlp和kg的restful接口

Apache-2.0127300

LexiconAugmentedNER

Reject complicated operations for incorporating lexicon for Chinese NER.

Language:Python43200

IRGAN

IRGAN: GAN for IR, SIGIR 2017, Thesis Introduction

Language:Jupyter Notebook900

IRGAN

IRGAN for QA

Language:Python100

IRGAN-AnswerSelection

Language:Python800

Jay_KG

周杰伦歌曲信息的知识图谱问答系统

Language:Python13300

Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Language:Python299000

keras_to_tensorflow

General code to convert a trained keras model into an inference tensorflow model

Language:PythonMIT166600