18F / doc_processing_toolkit

Python library to extract text from PDF, and default to OCR when text extraction fails.