Simon Lee's repositories
data-science-from-scratch
code for Data Science From Scratch book
firepad
Collaborative Text Editor Powered by Firebase
iNEXT
R package for interpolation and extrapolation
jieba
结巴中文分词
keywordfinder
Automatic keyword extraction - no alchemy required!
lexrank-summarizer
Spark-based LexRank extractive summarizer
Naive-Bayes-Classifier
朴素贝叶斯文本分类器
nltk_book
NLTK Book
ProgrammingWithScalding
Programming MapReduce with Scalding
pydata-book
Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media
pyDataScienceToolkits_Base
使用Python进行数据分析实验工具NumPy、Pandas、Matplotlib、Scikit-learn的入门介绍,使用IPython Notebook格式
PyMySQL
PyMySQL: Pure-Python MySQL Client
RAKE
A python implementation of the Rapid Automatic Keyword Extraction
scala-tfidf
keywords extraction
scoobi
A Scala productivity framework for Hadoop.
sedis
a thin scala wrapper for jedis (https://github.com/xetorthio/jedis)
SegPhrase-MultiLingual
SegPhrase working on Chinese and Arabic
simhash
A Python Implementation of Simhash Algorithm
snownlp
Python library for processing Chinese text
spark
Mirror of Apache Spark
spark-hyperloglog
Interactive Audience Analytics with Spark and HyperLogLog
spark-scalding
Use Cascading Taps and Scalding DSL with Spark
spree
Live-updating Spark UI built with Meteor
SpyGlass
Cascading and Scalding wrapper for HBase with advanced read features
TextClassify
中文文本分类器,训练简单,多种模型可选.
TextRank
Python implementation of TextRank algorithm (https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) for automatic keyword extraction and summarization using Levenshtein distance as relation between text units.
TextRank4ZH
:deciduous_tree:从中文文本中自动提取关键词和摘要
textstat
calculate statistics of text