Elliot's repositories
spark-ml-source-analysis
spark ml 算法原理剖析以及具体的源码实现分析
angel
A Flexible and Powerful Parameter Server for large-scale machine learning
ansj_seg
ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典
camus
LinkedIn's Kafka to HDFS pipeline.
canal
阿里巴巴mysql数据库binlog的增量订阅&消费组件
cws_evaluation
Java开源项目cws_evaluation:中文分词器分词效果评估对比
deeplearningbook-chinese
Deep Learning Book Chinese Translation
disconf
Distributed Configuration Management Platform(分布式配置管理平台)
elasticsearch
Open Source, Distributed, RESTful Search Engine
faiss
A library for efficient similarity search and clustering of dense vectors.
Familia
A Toolkit for Chinese Topic Modeling
FM_FTRL
Hashed Factorization Machine with Follow The Regularized Leader for Kaggle Avazu Click-Through Rate Competition
fnlp
中文自然语言处理工具包 Toolkit for Chinese natural language processing
gobblin
Universal data ingestion framework for Hadoop.
HanLP
汉语言处理包 中文分词 词性标注 命名实体识别 依存句法分析 关键词提取 自动摘要 短语提取 拼音 简繁转换
incubator-airflow
Apache Airflow (Incubating)
jieba
结巴中文分词
jstorm
Java Storm
kafka-manager
A tool for managing Apache Kafka.
KafkaOffsetMonitor
A little app to monitor the progress of kafka consumers and their lag wrt the queue.
learning-spark
Example code from Learning Spark book
liblinear-java
Java version of LIBLINEAR
ltp
Language Technology Platform
Online-Random-Bit-Regression-FTRL
Online Random Bit Regression with FTRL-Proximal in Python
scikit-learn
scikit-learn: machine learning in Python
snownlp
Python library for processing Chinese text
ssdb
SSDB - A fast NoSQL database, an alternative to Redis