s2t2 / ReAInvent

YouTube Q&A Chatbot using semantic search, large language models and Rest API.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

animated

ReAInvent

This was our submission for CruzHacks 2023, and is soon to become a full-fledged project (maybe). We created a question answer chatbot for youtube videos using semantic search, large language models, and REST API. The name is pronounced the same way as "reinvent".

The project is hosted live at https://reainvent.com/

What is it for?

We created this project to increase the efficiency of learning, allowing for longer form videos to be parsed down to text in the matter of seconds. Being able to quickly re-reference sections of lectures where professors go over specific types of problems, or trying to find where the professor talks about a specific quiz/grading policy can be quite the pain for videos that last over an hour. Being able to quickly find this information, and being ensured it is accurate, is an incredibly convenient and useful product for any student.

Outside of just university students, this program can be used with any youtube video, so long as it has an accurate transcription (nearly all Youtube videos have this due to auto-transcription). In any case where you find yourself scrubbing through a video to quickly find pieces of information, ReAInvent is there to do it faster for you.

How does it work?

We use pytube and youtube_transcript_api to scrape a given youtube url for its transcription. From there we run a semantic search with the query being the question asked by the client, and the document being the youtube transcription. To perform the semantic search, we use OpenAI's embedding models, and sort the transcription by cosine similarity in respect to the query in order to find the most relevant parts of the video for the question. We then feed the transcriptions as context to GPT-DaVinci, OpenAI's largest LLM, and the original question to achieve the accurate and digestable responses you see on the site.

We also use prompt engineering to prevent hallucinations (misinformation) by GPT, and include relevant timestamps so you can quickly watch back the sections you are looking for.

Running in Development Mode

Create a virtual environment

  1. Start by navigating to the project directory
  2. Create the virtual environment
python3 -m venv ./venv
  1. Activate the virtual environment
/venv/Scripts/activate.bat

Install dependencies

pip install -r requirements.txt

Start the API

python3 ./backend/server.py

Note: You need to provide an OpenAI API key in a .env file within the backend directory. OpenAI APIs are not free to use, and will require you to provide payment info to generate tokens past the free trial. The .env file should have the line openai_key = "{API_KEY}".

Start the Website

When running in development mode, make sure to set this line at the top of App.js so the site can communicate with the API. By default, it is set to "/api"

const API_ENDPOINT = "";

Now, start the website.

cd frontend
npm start

Navigate to localhost:3000 to view the webserver.

Note

These instructions are for running the project in development mode. You can build the API and frontend for production in a variety of ways. Currently, ReAInvent is hosted on a Google Cloud VM, using Gunicorn for hosting the backend, and Nginx for the frontend.

About

YouTube Q&A Chatbot using semantic search, large language models and Rest API.

License:GNU General Public License v3.0


Languages

Language:Python 48.7%Language:JavaScript 30.1%Language:CSS 14.9%Language:HTML 6.2%