ParaCrawl (paracrawl)

ParaCrawl

paracrawl

Geek Repo

Home Page:paracrawl.eu

Twitter:@paracrawl

Github PK Tool:Github PK Tool

ParaCrawl's repositories

Language:C++License:Apache-2.0Stargazers:23Issues:20Issues:2

corset

Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data.

Language:SCSSLicense:GPL-3.0Stargazers:17Issues:5Issues:2

keops

Tool for manual evaluation of parallel sentences.

Language:PHPLicense:GPL-3.0Stargazers:12Issues:6Issues:3

DataCollection

Data collection, alignment and TAUS repository

Language:PythonLicense:Apache-2.0Stargazers:8Issues:3Issues:3

cirrus-scripts

Scripts for running bitextor/paracrawl/europat jobs on cirrus.ac.uk

human-evaluations

Results of the human evaluation

Language:Rich Text FormatStargazers:5Issues:7Issues:2

synthesis

Data synthesis by contextualizing glossary translations

Language:PythonStargazers:5Issues:4Issues:0

embedding

Mine parallel corpora with embeddings

Language:PerlStargazers:4Issues:10Issues:0

europat-scripts

Scripts for obtaining patent data

tmxutil

Tools to generate & filter Europat tmx files.

Language:PythonLicense:MITStargazers:3Issues:5Issues:3

opus-train

Automate download and training with OPUS corpora

Language:ShellLicense:MITStargazers:2Issues:4Issues:1

b64filter

Program for operating on one document per Base 64 encoded line files

Domain_Adaptation

InDomain detection is a tool designed to extract in-domain data from a large collections of data.

Language:PythonLicense:GPL-3.0Stargazers:1Issues:5Issues:32

giashard

Sharding program for Paracrawl

giawarc

Processing utilities for Internet Archive

corpus-issues

Open here any Paracrawl corpus related issue

Stargazers:0Issues:4Issues:0

go-warc

A golang library to work with WARC files from the common crawl

Language:GoLicense:GPL-2.0Stargazers:0Issues:1Issues:0
License:CC0-1.0Stargazers:0Issues:4Issues:0
Language:PythonStargazers:0Issues:4Issues:0
Language:PythonStargazers:0Issues:4Issues:0