jind11 / MedQA

Code and data for MedQA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can I use document retriever component only?

serenayj opened this issue · comments

Hi,

Congrats on finishing such nice work! I would like to test my encoder (document reader) and want to use the IR document retriever component only. Could you tell me where I could find this part of the codes and how to do it? Thank you in advance!

I am sorry for the late reply. Thanks for reaching out to me! This code base provides the elastic search based IR baseline and you can follow the readme file to implement it. Specifically for the text (sentence or paragraph) retrieval, you can refer to this file: https://github.com/jind11/MedQA/blob/master/IR/aristomini/solvers/textsearch.py

Hi,

Thanks for answering my question!

A following question I have is: in your paper where you describe the fine-tuning pre-training BERT models, you mentioned that :
Specifically, we construct the input sequence by concatenating [CLS], tokens in c, [SEP], tokens in qai, [SEP], where [CLS] and [SEP] are the classifier token and sentence separator in a pre-trained language model, respectively
My understanding is that context c is a concatenation of all textbooks. Wouldn't that exceed the BERT token limit if you concatenate both questions, answers, and the context c ?

The c here should be the top-K retrieved sentences/paragraphs in the textbooks so that we do not need to concatenate all textbooks.