re-search / DocProduct

Medical Q&A with Deep Language Models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Continual training

weichen-huang opened this issue · comments

Hello,
Thanks for sharing your code. It is greatly appreciated. I am trying to apply this project to answer COVID-19 related questions. So far, I trained the Biobert retriever model on a Kaggle dataset of COVID-19 QA pairs. My goal is for the model to continually train as new questions arise on the topic of COVID-19. I have a few questions regarding how I can do this:

  1. Is there any way I can scrape QA pairs with relation to COVID-19 from medical question answering websites?
  2. Is there any way I can continually train the model without the problem of catastrophic forgetting?
    Thanks,
    Weichen Huang.

Sounds interesting. Is there a link to the kaggle covid qna dataset, I can't' seem to find it.

I believe we used an off the shelf python web scrapper.

I don't believe we ran into catastrophic forgetting.

Hello,
Thanks for the quick reply. I used the QA pairs from this dataset: https://www.kaggle.com/xhlulu/covidqa.
Would it be possible if you sent me the code for the web scraper? It would really help me with my project.
Thanks,
Weichen.
P.S. email: weichen.huang.2020@gmail.com

A different member of the team handled that part, but it looks like he used this https://github.com/re-search/DocProduct/blob/master/notebooks/webmd_data_gather.ipynb

Ah ok. Thanks! Closing