ajwallacemusic / semantic-movie-search

txtai python app to search movie semantically

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

semantic-movie-search

Semantic Movie Search is a simple python app powered by Elasticsearch and txtai.

It takes in a user input, gets the top 50 Elasticsearch results, then runs a txtai semantic similarity function on the top 500 Elasticsearch results, reranks them, and returns the new top 50 to compare.


Semantic Movie Search Demo


Setup/Installation

You need Python installed, and a few dependencies, specifically Streamlit (for running the app), Elasticsearch, and txtai.

This project assumes you have a local Elasticsearch cluster running on port 9200. You can run a local cluster and setup a movies index via docker-compose from the simple-reranker project repo. The only addition you'll need to make to the Elasticsearch cluster is adding the semantic_search_test search template located in the root of the project.

You can add it in Kibana with the following command:

PUT _scripts/semantic_search_test
{
  "script": {
    "lang": "mustache",
    "source": """
{
  "query": {
    "multi_match": {
      "fields": [
        "title",
        "description",
        "genres.name"
        ], 
        "query": "{{query_string}}"
      }
    
  }
}
    """
  }
}

Improvements

The searching and reranking is not fast. That's because a similarity function is ran against each Elasticsearch result (essentially converting text to vector embeddings on the fly.) This works well for a proof of concept, but ideally, you would convert the data to vectors and add to a new Elasticsearch index, or vector database.

About

txtai python app to search movie semantically


Languages

Language:Python 100.0%