anakin87 / who-killed-laura-palmer

Simple Question Answering system, based on data crawled from Twin Peaks Wiki. It is built using πŸ” Haystack, an awesome open-source framework for building search systems that work intelligently over large document collections.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

title emoji colorFrom colorTo sdk sdk_version app_file pinned license
Who killed Laura Palmer?
πŸ—»πŸ—»
blue
green
streamlit
1.2.0
app.py
false
apache-2.0

Who killed Laura Palmer?   Generic badge Generic badge

πŸ—»πŸ—» Twin Peaks Question Answering system

WKLP is a simple Question Answering system, based on data crawled from Twin Peaks Wiki. It is built using πŸ” Haystack, an awesome open-source framework for building search systems that work intelligently over large document collections.


Project architecture 🧱

Project architecture


What can I learn from this project? πŸ“š

  • How to quickly ⌚ build a modern Question Answering system using πŸ” Haystack
  • How to generate questions based on your documents
  • How to build a nice Streamlit web app to show your QA system
  • How to optimize the web app to πŸš€ deploy in πŸ€— Spaces

Web app preview

Repository structure πŸ“

Within each folder, you can find more in-depth explanations.

Installation πŸ’»

To install this project locally, follow these steps:

  • git clone https://github.com/anakin87/who-killed-laura-palmer
  • cd who-killed-laura-palmer
  • pip install -r requirements.txt

To run the web app, simply type: streamlit run app.py

Possible improvements ✨

Project structure

  • The project is optimized to be deployed in Hugging Face Spaces and consists of an all-in-one Streamlit web app. In more structured production environments, I suggest dividing the software into three parts:

Reader

  • The reader model (deepset/roberta-base-squad2) is a good compromise between speed and accuracy, running on CPU. There are certainly better (and more computationally expensive) models, as you can read in the Haystack documentation.
  • You can also think about preparing a Twin Peaks QA dataset and fine-tuning the reader model to get better accuracy, as explained in this Haystack tutorial.

About

Simple Question Answering system, based on data crawled from Twin Peaks Wiki. It is built using πŸ” Haystack, an awesome open-source framework for building search systems that work intelligently over large document collections.

License:Apache License 2.0


Languages

Language:Jupyter Notebook 95.4%Language:Python 4.6%