OSCAR (oscar-project)

OSCAR

oscar-project

Geek Repo

The Open Super-large Crawled Aggregated coRpus

Home Page:https://oscar-project.org

Twitter:@oscarnlp

Github PK Tool:Github PK Tool

OSCAR's repositories

ungoliant

:spider: The pipeline for the OSCAR corpus

Language:RustLicense:Apache-2.0Stargazers:156Issues:2Issues:43

goclassy

An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.

Language:GoLicense:Apache-2.0Stargazers:85Issues:9Issues:2

download_oscar

Downloading all files of a language from the OSCAR (Open Super-large Crawled Aggregated coRpus)

Language:PythonLicense:MITStargazers:10Issues:1Issues:3

oscar-website

The website of the Oscar Project

Language:TeXLicense:Apache-2.0Stargazers:10Issues:4Issues:10

corpus

corpus issues.

oscar-tools

The original tooling for the OSCAR corpus rewritten in Rust

Language:RustLicense:Apache-2.0Stargazers:3Issues:2Issues:16

oscar-blocklists

A compilation of multilingual URL blocklist

License:CC0-1.0Stargazers:2Issues:2Issues:0

oscar-statistics

Compute statistics for OSCAR Monthly releases

Language:RustLicense:Apache-2.0Stargazers:2Issues:0Issues:0

data-hub

Collab around OSCAR: Data soucing..

oscar-tools-go

A tooling for the OSCAR corpus

Language:GoLicense:Apache-2.0Stargazers:1Issues:1Issues:0

ut1-rs

ut1-blocklist rust library

Language:RustLicense:MITStargazers:1Issues:2Issues:2

oscar-io

Readers/Writers for OSCAR Corpus

Language:RustLicense:Apache-2.0Stargazers:0Issues:2Issues:3
Stargazers:0Issues:2Issues:0
Stargazers:0Issues:2Issues:0