cincarnato / youtube-semantic-search

Search through a youtube video using embeddings and vector database

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Youtube Semantic Search

TL;DR:

  • Paste a youtube video
  • It transcribes the video with OpenAI Whisper
  • Reformat the video segments in chuncks of 40 seconds so there is more context in each segment
  • It creates embeddings with Open AI embedding endpoint
  • Saves the embedings in Supabase as the vector database
  • When searching, converts the query to an embedding, then uses Supabase postgres function to search for similarities
yt-semantic.mp4

Transcription

The transcription is done in a Python Flask app running Open AI Whisper, check it here.

Embeddings

The video script chunks are converted to embeddings using OpenAI embeddings api.

Vector Database

The embeddings are store in a Supabase database with the pgvector extension. A postgres function is used for the similarity search (more info here).

To run

Run the python backend

> flask --app transcription_backend/server run

Run the front-end

> cd webapp
> npm run dev

About

Search through a youtube video using embeddings and vector database


Languages

Language:Svelte 39.1%Language:Python 22.1%Language:TypeScript 21.2%Language:JavaScript 14.6%Language:HTML 2.5%Language:CSS 0.4%