geeseek

followers

following

stars

Hangzhou, China

Wang Shi's repositories

MLToolkits

toolkits used to train ml models

Language:Python6 20

perl_tool

perl toolkits

1 20

toolkits

just toolkits

Language:Python1 20

aerosolve

A machine learning package built for humans.

Language:JavaApache-2.0000

analyzer-solr

analyzer adapter for solr 5, we support Jieba, and stranford in the future

Language:Java020

Awesome-Chinese-NLP

A curated list of resources for NLP (Natural Language Processing) for Chinese 中文自然语言处理相关资料

020

bitstarter

Language:JavaScript020

cayley

An open-source graph database

Language:GoApache-2.0020

Chinese-clinical-NER

CCKS2019中文命名实体识别任务。从医疗文本中识别疾病和诊断、解剖部位、影像检查、实验室检验、手术和药物6种命名实体。现已实现基于jieba和AC自动机的baseline构建、基于BiLSTM和CRF的序列标住模型构建。bert的部分代码主要源于https://github.com/charles9n/bert-sklearn.git 感谢作者。模型最终测试集得分0.81，还有较大改进空间。可以当做一个baseline。

Language:Python010

clib

libs for c/c++

Language:C++020

ClickhouseMeetup

Material of Clickhouse Meetup in China

Language:HTML010

dict_build

自动构建中文词库：build dict from large chinese text using unsupervised method，algorithm：http://www.matrix67.com/blog/archives/5044

Language:JavaApache-2.0000

easynpr

A website for reading npr news easily

Language:JavaScript000

geeseek.github.io

技术文章

Language:CSS000

HanLP

自然语言处理中文分词词性标注命名实体识别依存句法分析关键词提取新词发现短语提取自动摘要文本分类拼音简繁

Language:JavaApache-2.0000

icd-10

icd-10 dict

000

Information-Extraction-Chinese

Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文实体识别与关系提取

Language:Python020

jieba

结巴中文分词

Language:Python020

jieba-analysis

结巴分词(java版)

Language:JavaApache-2.0000

mmseg4j

MMSEG for java lucene chinese analyzer, or for solr, see http://technology.chtsai.org/mmseg/

Language:JavaApache-2.0020

npr

Language:Python000

scikit-learn

scikit-learn: machine learning in Python

Language:CNOASSERTION000

scrapy-proxies

Random proxy middleware for Scrapy

Language:Python000

setup

AWS EC2 setup files for Startup Engineering MOOC.

Language:Shell000

startup

startup course

000

superspider

scrapy-based super spider

Language:Python000

SUTDAnnotator

YEDDA: A Lightweight Collaborative Text Span Annotation Tool

Language:PythonApache-2.0000

word2vec

Automatically exported from code.google.com/p/word2vec

Language:CApache-2.0000