Need help on accessing the raw reddit data
Jianxin-MNM opened this issue · comments
Hi,
The dolma is really a fantastic work. I am currently trying to extend the data pipeline to more languages with the reddit data. Would any one help with:
- share workable link / access method to the raw reddit dataset?
- I have found some torrent links with the .zst file from multi archives, would anyone could help to share a sha256sum so that I can valid my downloading is working correctly?
Cheers!
Apologies, but we are not planning to share the raw reddit dataset.