Prototyping a bot that allows for chat over PDF documents.
Note that this uses BM25 for document retrieval, rather than a semantic approach with embeddings.
This uses Poetry for dependency management. To install dependencies:
$ poetry install
You'll also need to create a .env
file and add your OPENAI_API_KEY
to it (see .env.example
).
The command below will run the pipeline on the papers
directory, which contains a few PDFs. It will then start a REPL where you can ask questions about the PDFs. You can exit the Q&A loop by typing "exit" or cmd/ctrl + c
$ poetry run python haystack_pdf_bot/main.py --pdf_directory=papers