rbehnke / pile-literotica

Download, parse, and filter data from Literotica. Data-ready for The-Pile.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The-Pile-Literotica

Download, parse, and filter Literotica, data-ready for The-Pile.

Data are by spidering each category, spidering each story, then following up for each page. Data are not filtered beyond the body text.

✔ Saved to data/Literotica.jsonl
ℹ Saved 473,653 stories
ℹ Uncompressed filesize 12,736,536,394
ℹ Compressed filesize    4,426,369,159

Data souce temporary hosted at

 > sha256sum Literotica.jsonl.zst
 3c6b968f851831c6345f175b394416f7521da3bacd90fdc827093f0d310bd4ef  Literotica.jsonl.zst

About

Download, parse, and filter data from Literotica. Data-ready for The-Pile.


Languages

Language:Python 100.0%