Bitextor Team (bitextor)

Bitextor Team

bitextor

Geek Repo

Translation memories generator

Github PK Tool:Github PK Tool

Bitextor Team's repositories

bitextor

Bitextor generates translation memories from multilingual websites

Language:PythonLicense:GPL-3.0Stargazers:283Issues:30Issues:159

bicleaner

Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.

Language:PythonLicense:GPL-3.0Stargazers:149Issues:14Issues:52

pdf-extract

PDF parser and converter to HTML

Language:JavaLicense:GPL-3.0Stargazers:81Issues:17Issues:51

bicleaner-ai

Bicleaner fork that uses neural networks

Language:PythonLicense:GPL-3.0Stargazers:35Issues:12Issues:20

bifixer

Tool to fix bitexts and tag near-duplicates for removal

Language:PythonLicense:GPL-3.0Stargazers:28Issues:7Issues:11

warc2text

Extracts plain text, language identification and more metadata from WARC records

Language:C++License:MITStargazers:18Issues:9Issues:25

biroamer

Utility that will help you to ROAM (Random Omit Anonymize and Mix) your parallel corpus.

Language:PythonLicense:GPL-3.0Stargazers:9Issues:9Issues:4

neural-document-aligner

Document aligner which uses neural technologies to search matches across bilingual documents

Language:PythonLicense:GPL-3.0Stargazers:7Issues:9Issues:1

bicleaner-data

Repository for data models, dictionaries and more resources for Bicleaner

Language:PythonLicense:GPL-3.0Stargazers:6Issues:9Issues:3
Language:PythonLicense:GPL-3.0Stargazers:6Issues:10Issues:1

python-pdfextract

Python interface to pdf-extract, HTML extraction from PDF

Language:PythonLicense:NOASSERTIONStargazers:6Issues:2Issues:0

bicleaner-ai-data

Repository of Bicleaner AI models

License:NOASSERTIONStargazers:5Issues:10Issues:0

bicleaner-hardrules

Pre-filtering step for bicleaner

Language:PythonLicense:GPL-3.0Stargazers:4Issues:9Issues:3

bitextor-data

Repository for data models, dictionaries and more resources for Bitextor

License:GPL-3.0Stargazers:4Issues:9Issues:0

bitextor-neural

Bitextor Neural generates translation memories from multilingual websites using state-of-the-art Machine Learning tools

Language:PythonLicense:GPL-3.0Stargazers:3Issues:9Issues:0

prevertical2text

Extracts plain text, language identification and more metadata from Spiderling prevertical files

Language:C++License:MITStargazers:2Issues:9Issues:1

vecalign

Improved Sentence Alignment in Linear Time and Space

Language:PythonLicense:Apache-2.0Stargazers:2Issues:3Issues:0

loomchild-segment-py

Python module to interface with Java Loomchild sentence segmenter

Language:PythonLicense:GPL-3.0Stargazers:1Issues:3Issues:1

monocleaner-data

Monocleaner models repository

License:GPL-3.0Stargazers:1Issues:9Issues:0

bicleaner-ai-glove

Fork of glove-python to distribute binary builds

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

bitextor-testing-output

Repository for storing testing outputs from Bitextor

License:GPL-3.0Stargazers:0Issues:8Issues:0

cld2

Compact Language Detector 2

Language:C++License:Apache-2.0Stargazers:0Issues:3Issues:0

deferred-crawling

Reconstructs sentences using deferred crawling standoff annotations from Bitextor

Language:PythonLicense:MITStargazers:0Issues:8Issues:0

fastText

Library for fast text representation and classification.

Language:HTMLLicense:MITStargazers:0Issues:1Issues:0

hunalign

Sentence aligner

Language:C++License:LGPL-3.0Stargazers:0Issues:2Issues:0

python-apachetika

Python interface to Apache Tika, HTML extraction from PDF

Language:PythonLicense:NOASSERTIONStargazers:0Issues:1Issues:0