Continual training

Question

Continual training

weichen-huang opened this issue 4 years ago · comments

Hello,
Thanks for sharing your code. It is greatly appreciated. I am trying to apply this project to answer COVID-19 related questions. So far, I trained the Biobert retriever model on a Kaggle dataset of COVID-19 QA pairs. My goal is for the model to continually train as new questions arise on the topic of COVID-19. I have a few questions regarding how I can do this:

Is there any way I can scrape QA pairs with relation to COVID-19 from medical question answering websites?
Is there any way I can continually train the model without the problem of catastrophic forgetting?
Thanks,
Weichen Huang.

Santosh Gupta · Answer 1 · Fri Jun 19 2020 07:29:12 GMT+0800 (China Standard Time)

Sounds interesting. Is there a link to the kaggle covid qna dataset, I can't' seem to find it.

I believe we used an off the shelf python web scrapper.

I don't believe we ran into catastrophic forgetting.

Weichen Huang · Answer 2 · Fri Jun 19 2020 13:44:05 GMT+0800 (China Standard Time)

Hello,
Thanks for the quick reply. I used the QA pairs from this dataset: https://www.kaggle.com/xhlulu/covidqa.
Would it be possible if you sent me the code for the web scraper? It would really help me with my project.
Thanks,
Weichen.
P.S. email: weichen.huang.2020@gmail.com

Santosh Gupta · Answer 3 · Fri Jun 19 2020 16:22:42 GMT+0800 (China Standard Time)

A different member of the team handled that part, but it looks like he used this https://github.com/re-search/DocProduct/blob/master/notebooks/webmd_data_gather.ipynb

Weichen Huang · Answer 4 · Fri Jun 19 2020 20:13:44 GMT+0800 (China Standard Time)

Ah ok. Thanks! Closing