Wang Shi's repositories
MLToolkits
toolkits used to train ml models
aerosolve
A machine learning package built for humans.
analyzer-solr
analyzer adapter for solr 5, we support Jieba, and stranford in the future
Awesome-Chinese-NLP
A curated list of resources for NLP (Natural Language Processing) for Chinese 中文自然语言处理相关资料
Chinese-clinical-NER
CCKS2019中文命名实体识别任务。从医疗文本中识别疾病和诊断、解剖部位、影像检查、实验室检验、手术和药物6种命名实体。现已实现基于jieba和AC自动机的baseline构建、基于BiLSTM和CRF的序列标住模型构建。bert的部分代码主要源于https://github.com/charles9n/bert-sklearn.git 感谢作者。 模型最终测试集得分0.81,还有较大改进空间。可以当做一个baseline。
ClickhouseMeetup
Material of Clickhouse Meetup in China
dict_build
自动构建中文词库:build dict from large chinese text using unsupervised method,algorithm:http://www.matrix67.com/blog/archives/5044
easynpr
A website for reading npr news easily
HanLP
自然语言处理 中文分词 词性标注 命名实体识别 依存句法分析 关键词提取 新词发现 短语提取 自动摘要 文本分类 拼音简繁
icd-10
icd-10 dict
Information-Extraction-Chinese
Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文实体识别与关系提取
jieba-analysis
结巴分词(java版)
scikit-learn
scikit-learn: machine learning in Python
scrapy-proxies
Random proxy middleware for Scrapy
setup
AWS EC2 setup files for Startup Engineering MOOC.
startup
startup course
superspider
scrapy-based super spider
SUTDAnnotator
YEDDA: A Lightweight Collaborative Text Span Annotation Tool
word2vec
Automatically exported from code.google.com/p/word2vec