nsantavas / OBQnA

OpenBook Question Anwsering System

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OBQnA

An OpenBook Question 'n' Answer System

Introduction

OBQnA is a high-level OO Python package which aims to provide an easy and intuitive way of creating a OpenBook Question ‘n’ Answer system.

The package parses PDF files using Apache Tika, splits the corpus into passages and calculates their corresponding Dense vector representation exploiting a Transformer NLP model. For each question asked, the system performs a Dense Passage Retrieval, using an efficient similarity search library (Faiss, ScaNN or Annoy) and extracts the answer from the retrieved passages.


Install

To install simply do pip install -r requirements.tx

  • note: If you want to use GPU please install CUDA

Python code example

We are using J. R. R. Tolkien's Lord Of The Rings Trilogy and the Hobbit for the following example.

For more detailed explanation please read the Documentation.


Parsing PDFs and performing some basic text cleaning

from obqna.process import PDFParser, Passages
from obqna.qa import QuestionAnswering

parser = PDFParser("../books/") # Path of PDFs
books = parser.parse()
books = parser.clean(books)

Splitting the corpus into passages

passages = Passages()
corpus = passages.df2passages(books)

Calculate the vector represantation of each passage and store the corresponding indices

searcher_type = "scann" # other choices: "faiss", "annoy"
qna = QuestionAnswering(searcher_type)
qna.prepare(corpus)

Ask questions

questions = [
    "Who is Galadriel?",
    "Who is Isildur?",
    "Who is Boromir's father?",
    "Who is Aragorn?",
    "Was the ring destroyed?",
    "What language is on the One Ring inscription?"
]

for question in questions:
    print(f"Question: {question}: ")
    results = qna.ask(question)
    print(f"Answer: {results['answer']}")
    print(10*'-')
Question: Who is Galadriel?: 
Answer: The Lady of Lorien
----------
Question: Who is Isildur?: 
Answer: Elendils son
----------
Question: Who is Boromir's father?: 
Answer: Lord Denethor
----------
Question: Who is Aragorn?: 
Answer: Heir of Isildur
----------
Question: Was the ring destroyed?: 
Answer: it perished from the world in the ruin of his first realm
----------
Question: What language is on the One Ring inscription?: 
Answer: Black Speech
----------

About

OpenBook Question Anwsering System

License:Apache License 2.0


Languages

Language:Python 71.8%Language:Jupyter Notebook 18.6%Language:Dockerfile 4.0%Language:Batchfile 3.1%Language:Makefile 2.5%