Can I use document retriever component only?

Question

Can I use document retriever component only?

serenayj opened this issue 2 years ago · comments

Yanjun Gao (Serena) commented 2 years ago

Hi,

Congrats on finishing such nice work! I would like to test my encoder (document reader) and want to use the IR document retriever component only. Could you tell me where I could find this part of the codes and how to do it? Thank you in advance!

Di Jin · Answer 1 · Wed Jun 29 2022 15:13:15 GMT+0800 (China Standard Time)

I am sorry for the late reply. Thanks for reaching out to me! This code base provides the elastic search based IR baseline and you can follow the readme file to implement it. Specifically for the text (sentence or paragraph) retrieval, you can refer to this file: https://github.com/jind11/MedQA/blob/master/IR/aristomini/solvers/textsearch.py

Yanjun Gao (Serena) · Answer 2 · Wed Jul 13 2022 01:23:57 GMT+0800 (China Standard Time)

Hi,

Thanks for answering my question!

A following question I have is: in your paper where you describe the fine-tuning pre-training BERT models, you mentioned that :
Specifically, we construct the input sequence by concatenating [CLS], tokens in c, [SEP], tokens in qai, [SEP], where [CLS] and [SEP] are the classifier token and sentence separator in a pre-trained language model, respectively
My understanding is that context c is a concatenation of all textbooks. Wouldn't that exceed the BERT token limit if you concatenate both questions, answers, and the context c ?

Di Jin · Answer 3 · Wed Jul 13 2022 07:11:05 GMT+0800 (China Standard Time)

The c here should be the top-K retrieved sentences/paragraphs in the textbooks so that we do not need to concatenate all textbooks.