Can I use the dolma toolkit to process my own datasets?
Tendo33 opened this issue · comments
Jinfeng Sun commented
I got some data myself through a crawler, and I was wondering if I could use the dolma toolkit to remove duplicates.
Luca Soldaini commented
Yes! you can use our dolma dedupe
command. Please let us know if you have questions!