kbressem / medAlpaca

LLM finetuned for medical question answering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stack Exchange HF datasets aren't available

griff4692 opened this issue · comments

Could you share the scraping scripts even if you can no longer share the processed data?

Thanks you!

commented

Sorry for the rare replies, at the moment I do not find the time to code much.
My crawler was loosely build on scripts/notebooks from this issue: LAION-AI/Open-Assistant#191

It was a temporary hacked together crawler, which I unfortunately seemed to have removed while cleaning up the local directory.

However, there are great stack exchange datasets on Hugging Face, which correctly cite the sources and are more complete than my crawl. I would advise against crawling Stack Exchange, just use the HF dataset instead.