jcpeterson / openwebtext

Open clone of OpenAI's unreleased WebText dataset scraper. This version uses pushshift.io files instead of the API for speed.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

jcpeterson/openwebtext Stargazers