cfoster0 / humongous-rs

A Rust pipeline for extracting HUMONGOUS, a dataset of web-based text extracted from Common Crawl and ready for multilingual language modeling.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This repository is not active

About

A Rust pipeline for extracting HUMONGOUS, a dataset of web-based text extracted from Common Crawl and ready for multilingual language modeling.


Languages

Language:Rust 100.0%