Bitextor Team's repositories
pdf-extract
PDF parser and converter to HTML
bicleaner-ai
Bicleaner fork that uses neural networks
neural-document-aligner
Document aligner which uses neural technologies to search matches across bilingual documents
bicleaner-data
Repository for data models, dictionaries and more resources for Bicleaner
bitextor-data
Repository for data models, dictionaries and more resources for Bitextor
python-pdfextract
Python interface to pdf-extract, HTML extraction from PDF
bicleaner-ai-data
Repository of Bicleaner AI models
bicleaner-hardrules
Pre-filtering step for bicleaner
bitextor-neural
Bitextor Neural generates translation memories from multilingual websites using state-of-the-art Machine Learning tools
prevertical2text
Extracts plain text, language identification and more metadata from Spiderling prevertical files
loomchild-segment-py
Python module to interface with Java Loomchild sentence segmenter
monocleaner-data
Monocleaner models repository
bicleaner-ai-glove
Fork of glove-python to distribute binary builds
bitextor-testing-output
Repository for storing testing outputs from Bitextor
deferred-crawling
Reconstructs sentences using deferred crawling standoff annotations from Bitextor
python-apachetika
Python interface to Apache Tika, HTML extraction from PDF