HPLT - High Performance Language Technologies (hplt-project)

HPLT - High Performance Language Technologies

hplt-project

Geek Repo

A space that combines petabytes of natural language data with large-scale model training

Home Page:hplt-project.org

Twitter:@hplt_eu

Github PK Tool:Github PK Tool

HPLT - High Performance Language Technologies's repositories

sacremoses

Python port of Moses tokenizer, truecaser and normalizer

Language:PythonLicense:MITStargazers:483Issues:13Issues:79

OpusCleaner

OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.

OpusTrainer

Curriculum training

Language:PythonLicense:MITStargazers:15Issues:6Issues:33

monolingual-multilingual-instruction-tuning

Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca

Language:PythonStargazers:8Issues:8Issues:0

data-analytics-tool

Data Analytics Tool

Language:JavaScriptStargazers:4Issues:5Issues:0

HPLT-MT-Models

This contains the configuration and scripts for HPLT MT model releases.

warc2text-runner

Scripts for parallelized extraction of plain texts from WARC archieves. Aiming at common and reproducible extraction approach.

ia-download

Internet archive downloader

Language:Jupyter NotebookStargazers:2Issues:2Issues:2

monotextor-slurm

Set of scripts to run monotextor-like pipeline under slurm HPCs

Language:RustLicense:GPL-3.0Stargazers:2Issues:5Issues:5

HPLT-WP4

Information and pipelines on WP4: language models training

Language:PythonLicense:CC0-1.0Stargazers:1Issues:0Issues:0

document-aligner

tf/idf-based document aligner from Bitextor

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

clianer

A lightweight command-line frontend to OpusCleaner

Language:PythonLicense:MITStargazers:0Issues:0Issues:0
Language:PHPLicense:MITStargazers:0Issues:0Issues:0

OpusFilter

OpusFilter - Parallel corpus processing toolkit

License:MITStargazers:0Issues:0Issues:0

paracrawl-dashboard

Make-shift interface for managing Paracrawl processing and exploring its outputs

Language:HTMLStargazers:0Issues:2Issues:0