glaucia86 / vector-search-azure-cosmos-db-postgresql

This sample shows how to build vector similarity search on Azure Cosmos DB for PostgreSQL using the pgvector extension and the multi-modal embeddings APIs of Azure AI Vision.

Home Page:https://sfoteini.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Image similarity search on Azure Cosmos DB for PostgreSQL with pgvector

This project demonstrates the creation of an image similarity search application utilizing Azure Cosmos DB for PostgreSQL as a vector database and Azure AI Vision for generating embeddings. It serves as a starting point that can be used for the development of more sophisticated vector search solutions.

In this sample application, we will explore image similarity search on Azure Cosmos DB for PostgreSQL using the SemArt Dataset. This dataset contains approximately 21k paintings gathered from the Web Gallery of Art. Each painting comes with various attributes, like a title, description, and the name of the artist.

Prerequisites

Before you start, ensure that you have the following prerequisites installed and configured:

Set-up your working environment

Before running the Python scripts and Jupyter Notebooks, you should:

  1. Clone this repository to to have it locally available.

  2. Download the SemArt Dataset into the semart_dataset directory.

  3. Create a virtual environment and activate it.

  4. Install the required Python packages using the following command:

    pip install -r requirements.txt
  5. Generate a .env file by using the provided .env.sample file from this repository.

How to use the samples

Sample Description
Data Preprocessing Cleans up the SemArt Dataset and creates the final dataset that is utilized in our application.
Embeddings Generation Generates vector embeddings for the images in the dataset using the Azure AI Vision Vectorize Image API and creates the final dataset that is utilized in the image search application.
Upload images to Azure Blob Storage Creates an Azure Blob Storage container and uploads the paintings' images.
Insert data to Azure Cosmos DB for PostgreSQL Creates a table in the Azure Cosmos DB for PostgreSQL cluster and populates it with data from the dataset.
Exact nearest neighbor search with pgvector Demonstrates text-to-image and image-to-image search approaches, along with a simple method for metadata filtering.

More samples will be added soon!

Resources

Blog Posts

Title Summary
Use the Azure AI Vision multi-modal embeddings API for image retrieval Explore the basics of vector search and generate vector embeddings for images and text using the Azure AI Vision multi-modal embeddings APIs.
Generate embeddings with Azure AI Vision multi-modal embeddings API Discover the art of generating vector embeddings for paintings’ images using the Azure AI Vision multi-modal embeddings APIs in Python.
Store embeddings in Azure Cosmos DB for PostgreSQL with pgvector Learn how to configure Azure Cosmos DB for PostgreSQL as a vector database and insert embeddings into a table using the pgvector extension.
Use pgvector for searching images on Azure Cosmos DB for PostgreSQL Learn how to write SQL queries to search for and identify images that are semantically similar to a reference image or text prompt using pgvector.

References

Feel free to experiment with the project and modify the code to meet your specific use cases and requirements!

About

This sample shows how to build vector similarity search on Azure Cosmos DB for PostgreSQL using the pgvector extension and the multi-modal embeddings APIs of Azure AI Vision.

https://sfoteini.github.io

License:MIT License


Languages

Language:Jupyter Notebook 99.8%Language:Python 0.2%