Senthi1Kumar / DEMO-semantic-search-podcast

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Semantic search on podcast transcripts

This project's origin is here.

(TODO: Add Description)

(TODO: Add demo video)

Vectorization module: sentence-transformers/msmarco-distilroberta-base-v2

Prerequisites

(TO DO)

Setup instructions

  1. Set-up Weaviate: docker-compose up -d*
  2. Install depenencies: pip install -r requirements.txt
  3. Import data: python3 import.py**
  4. Query data: Go to console.semi.technology on Chrome/Safari and connect to http://localhost:9999. Click on Query Module to start querying using GraphQL

*Change port 9999 in docker-compose.yml and import.py to a different value (like 8888), if not able to connect
**Could take up to 3 hrs 🙂

Usage instructions

Example Queries:

Suppose we want to listen to some Changelog episodes discussing GraphQL. We can list the desired episode titles (and transcripts too) via nearText for the concept Episode about graphql:

Screenshot 2022-03-29 191123

The Changelog #255 is Why is GraphQL so cool?
The Changelog #297 is Prisma and the GraphQL data layer
The Changelog #316 is REST easy, GraphQL is here

Well, that was quite simple. In fact, a podcast search engine could have provided the same results.
So how about we list some episodes about web development but in the context of Python and not Javascript.
In addition to nearText for the concept of Episode about web development, we'll also add moveTo (for python) and moveAwayFrom (for javascript) arguements:

image

The Changelog #301 is Python at Microsoft
The Changelog #229 is Python, Django, and Channels

Let's say that listening to the GraphQL and Python episodes has inspired us to create a Machine Learning startup. Thus we would now like to listen to CEOs and Founders but in the field of Machine Learning or Data Science instead of vanilla Web Development:

aiCeo

The Practical AI #149 is Trends in data labeling (With CEO of Label Studio)
The Changelog #305 is Putting AI in a box at MachineBox (With founders of MachineBox)
The Practical AI #134 is Apache TVM and OctoML (With CEO and co-founder of OctoML )
The Practical AI #148 is Stellar inference speed via AutoNAS (With CEO and co-founder of Deci)
The Practical AI #141 is Towards stability and robustness (With CTO of BeyondMinds)

Dataset license

300 Podcast transcripts from Changelog

About


Languages

Language:Python 100.0%