Stack Exchange HF datasets aren't available
griff4692 opened this issue · comments
Griffin Adams commented
Could you share the scraping scripts even if you can no longer share the processed data?
Thanks you!
Shuyue Jia (Bruce) commented
Mark
Keno commented
Sorry for the rare replies, at the moment I do not find the time to code much.
My crawler was loosely build on scripts/notebooks from this issue: LAION-AI/Open-Assistant#191
It was a temporary hacked together crawler, which I unfortunately seemed to have removed while cleaning up the local directory.
However, there are great stack exchange datasets on Hugging Face, which correctly cite the sources and are more complete than my crawl. I would advise against crawling Stack Exchange, just use the HF dataset instead.