MaCoCu (macocu)

MaCoCu

macocu

Geek Repo

MaCoCu focuses on collecting monolingual and parallel data from the Internet, specially for under-resourced languages and DSI-specific data.

Home Page:https://macocu.eu/

Github PK Tool:Github PK Tool

MaCoCu's repositories

LanguageModels

Tools for training LMs

Language:PythonLicense:GPL-3.0Stargazers:4Issues:2Issues:0

prevert

Iterator for the prevert format

Language:PythonLicense:Apache-2.0Stargazers:2Issues:3Issues:0

BCMS-variant-classifier

A classification tool for discriminating between Bosnian, Croatian, Montenegrin, and Serbian

License:Apache-2.0Stargazers:0Issues:2Issues:0

DSI

Code for the DSI experiments in the MaCoCu project

Language:PythonStargazers:0Issues:0Issues:0

HT-vs-MT

Source code for EAMT 2022 paper "Automatic Discrimination of Human and Neural Machine Translation: A Study with Multiple Pre-Trained Models and Longer Context".

Language:ShellLicense:MITStargazers:0Issues:0Issues:0

Manual-Checking-Web-Corpora-Guidelines

The Guidelines for Manual Checking of Web Corpora

Language:JavaScriptStargazers:0Issues:0Issues:3

Monolingual-Curation

The Repository for the Curation of Monolingual Data work package

Language:Jupyter NotebookStargazers:0Issues:2Issues:1