Alex S. Liu's repositories
pdf2json-c
PDF2JSON is a conversion library based on XPDF (3.02) which can be used for high performance PDF page by page conversion to JSON and XML format. It also supports compressing data to minimize size. PDF2JSON is available for Windows, OSX and Linux.
OCRmyPDF
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
simhash-cluster
A cluster implementation of simhash near-duplicate detection
caffe
Caffe: a fast framework for deep learning. For the most recent version checkout the dev branch. For the latest stable release checkout the master branch.
nltk
NLTK Source
TF_IDF
用python实现TF_IDF算法,用于文档的相关性搜索
pdf2json-nj
A PDF file parser that converts PDF binaries to text based JSON, powered by a fork of PDF.JS
adversarial
Code and hyperparameters for the paper "Generative Adversarial Networks"
pdf2img
Python lib, which converts pdf pages into images
cws
Chinese Word Segmentation
txt2img
Converts text into various image formats for the purpose of training a neural network to classify individual characters
poppler
PDF rendering library based on the xpdf-3.0 code base