haurhi / S2QA

Get answers to research questions from 200M+ papers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

πŸ“šπŸ€– S2QA: Question Answering on research papers from Semantic Scholar

Have you ever wondered what research papers have to say about your burning questions? Look no further than Semantic Scholar Q&A with GPT-3! πŸ™Œ

This Python script lets you enter a question, and it uses the power of Semantic Scholar and GPT-3 to generate an answer based on the content of the top research papers. πŸ€–πŸ”

  • s2qa_nb.ipynb - main notebook
  • utils.py - has all the necessary functions for search and GPT-3 prompting
  • s2qa_sources_langchain.ipynb - Get better answers with langchain mapreduce but this is very expensive. This returns the sources of the results as well.

Examples

Answers with sources and langchain mapreduce

s2 with langchain and sources

Answers with regular "stuffing" context

>> query = "How does iron supplementation affect anemia?"

>> answer_question(df, question=query, debug=False)

'Iron supplementation can reduce anemia in pregnant women with mild or no anemia, but it can also increase the risk of neonatal jaundice. Iron supplementation can also improve iron stores and decrease anemia in non-pregnant women, but it can also increase the risk of diarrhea. Good adherence and initiation of supplementation before conception are needed to reduce anemia during early pregnancy.'
>> query = "What are the effects of sleep training on infants?"

>> answer_question(df, question=query, debug=False)

'Sleep training can lead to improved sleeping patterns, decreased parental stress, and increased parental competence. It can also lead to improved sleep efficiency, sleep onset latency, and sleep duration.'

Requirements 🧰

These can be added in the constants.py

The main third-party package requirements are tiktoken, openai, transformers and langchain.

To install all the required packages

pip install -r requirements.txt

Pipeline πŸš€

1️⃣ Searching : We begin by searching the vast and ever-growing database of Semantic Scholar to find the most up-to-date and relevant papers and articles related to your question.

2️⃣ Re-Ranking : We then use SPECTER to embed these papers and re-rank the search results, ensuring that the most informative and relevant articles appear at the top of your search results.

3️⃣ Answering : Finally, we use the powerful natural language processing capabilities of GPT-3 to generate informative and accurate answers to your question, using custom prompts to ensure the best results.

Customizable πŸ–ŠοΈ

  • Try other open embedding methods on huggingface to see better re-ranking results.

  • Try other prompts or refine the current prompt to get even better answers.

TODO πŸ‘ˆ

  • Add citations to the statements generated by GPT-3. As we have links to the actual paper this shouldn't be hard to do. See s2qa_sources_langchain.ipynb
  • Evaluate for some questions. Report results

About

Get answers to research questions from 200M+ papers


Languages

Language:Jupyter Notebook 79.7%Language:Python 20.3%