asitang's repositories
sklearn-hierarchical-classification
Hierarchical classification module based on scikit-learn's interfaces
getting-started-with-git-and-github
Explaining Git and GitHub.
markdown-cheatsheet
Markdown Cheatsheet for Github Readme.md
nutch
Mirror of Apache Nutch
parser-indexer
Metadata Parser and Solr Indexer. For Python equivalent, checkout https://github.com/USCDataScience/parser-indexer-py
pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
pycel
A library for compiling excel spreadsheets to python code & visualizing them as a graph
pytesseract
A Python wrapper for Google Tesseract
shangridocs
Document exploration tool
soft_cosine
Exploration of Soft Cosine measure in document similarity computation tasks
tika-similarity
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
word2vec
Automatically exported from code.google.com/p/word2vec