qingniufly

followers

following

stars

Shanghai

Simon Lee's repositories

canopy-clustering-spark

Language:Scala000

data-science-from-scratch

code for Data Science From Scratch book

Language:PythonUnlicense000

firepad

Collaborative Text Editor Powered by Firebase

Language:JavaScriptNOASSERTION000

iNEXT

R package for interpolation and extrapolation

Language:R000

jieba

结巴中文分词

Language:PythonMIT000

keywordfinder

Automatic keyword extraction - no alchemy required!

Language:Python000

LDAvis

Language:JavaScriptNOASSERTION000

lexrank-summarizer

Spark-based LexRank extractive summarizer

Language:ScalaMIT000

Naive-Bayes-Classifier

朴素贝叶斯文本分类器

Language:Python000

nltk_book

NLTK Book

Language:TeX000

ProgrammingWithScalding

Programming MapReduce with Scalding

Language:ScalaNOASSERTION000

pydata-book

Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media

Language:Python000

pyDataScienceToolkits_Base

使用Python进行数据分析实验工具NumPy、Pandas、Matplotlib、Scikit-learn的入门介绍，使用IPython Notebook格式

Language:Jupyter NotebookMIT000

PyMySQL

PyMySQL: Pure-Python MySQL Client

Language:PythonMIT000

RAKE

A python implementation of the Rapid Automatic Keyword Extraction

Language:PythonMIT000

scala-tfidf

keywords extraction

Language:ScalaMIT000

scoobi

A Scala productivity framework for Hadoop.

Language:Scala000

sedis

a thin scala wrapper for jedis (https://github.com/xetorthio/jedis)

Language:ScalaMIT000

SegPhrase-MultiLingual

SegPhrase working on Chinese and Arabic

Language:C++000

simhash

A Python Implementation of Simhash Algorithm

Language:PythonMIT000

snownlp

Python library for processing Chinese text

Language:PythonMIT000

spark

Mirror of Apache Spark

Language:ScalaApache-2.0000

spark-hyperloglog

Interactive Audience Analytics with Spark and HyperLogLog

Language:ScalaMIT000

spark-scalding

Use Cascading Taps and Scalding DSL with Spark

Language:ScalaApache-2.0000

spree

Live-updating Spark UI built with Meteor

Language:JavaScriptApache-2.0000

SpyGlass

Cascading and Scalding wrapper for HBase with advanced read features

Language:JavaApache-2.0000

TextClassify

中文文本分类器,训练简单,多种模型可选.

Language:Python000

TextRank

Python implementation of TextRank algorithm (https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) for automatic keyword extraction and summarization using Levenshtein distance as relation between text units.

Language:Python000

TextRank4ZH

:deciduous_tree:从中文文本中自动提取关键词和摘要

Language:PythonMIT000

textstat

calculate statistics of text

Language:Python000