oscar-project / oscar-tools

The original tooling for the OSCAR corpus rewritten in Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Port from Ungoliant: `rebuild` command

Uinelj opened this issue · comments

This is the command to rebuild the corpus from rebuild (avro) files.
It might be trickier to port because it has more dependencies, iirc.

Code is here: https://github.com/oscar-project/ungoliant/blob/v1.2.3/src/processing/rebuild.rs