Serverless Semantic Video Search Using a Vector Database and a Multi-Modal Generative AI Embeddings Model

You can find the related blogpost to this repository here: Implement serverless semantic search of image and live video with Amazon Titan Multimodal Embeddings!

Deploying the infrastructure requires you to have sufficient AWS privileges to do so.

Warning

This example is for experimental purposes only and is not production ready. The deployment of this sample can incur costs. Please ensure to remove infrastructure via the provided scripts when not needed anymore.

AWS Account Prerequisites
Deploy to Amplify
Local Development Prerequisites
Local Build
Clean Up
Usage Instructions
Solution Walkthrough

AWS Account Prerequisites

Enabled Model Access for Amazon Bedrock Titan Multimodal Embeddings G1 using instructions

Deploy to Amplify

Click the button above to deploy this solution with default parameters directly in your AWS account or use the Amplify Console to setup Github Access.
In the select service role section, create a new service role and see Amplify Service Role for required permissions used for the deployment role

Caution

We advise you to restrict access to branches using a username and password to limit resource consumption by unintended users by following this guide.

Add a SPA Redirect

Amplify Service Role

Attach AdministratorAccess rather than AdministratorAccess-Amplify
- Optional: You can use AdministratorAccess-Amplify but add a new IAM policy with additional required permissions which may include:
  - "aoss:BatchGetCollection"
  - "aoss:CreateAccessPolicy"
  - "aoss:CreateCollection"
  - "aoss:GetSecurityPolicy"
  - "aoss:CreateSecurityPolicy"
  - "aoss:DeleteSecurityPolicy"
  - "aoss:DeleteCollection"
  - "aoss:DeleteAccessPolicy"
  - "aoss:TagResource"
  - "aoss:UntagResource"
  - "kms:Decrypt"
  - "kms:Encrypt"
  - "kms:DescribeKey"
  - "kms:CreateGrant"

Local Development Prerequisites

AWS CLI
python 3.11
pip 24.0 or higher
virtualenv 20.25.0 or higher
node v20.10.0 or higher
npm 10.5.0 or higher
amplify CLI 12.10.1 or higher
- Use us-east-1 for deployment region
- See Amplify Service Role for required permissions used for the deployment role

Local Build

amplify init
npm ci
amplify push
npm run dev

Important

We advise you to run the application in a sandbox account and deploy the frontend locally.

[Optional] Manually Deployed Cloud Hosted Frontend

Caution

Using the Cloud hosted frontend with the default cognito settings of allowing any user to create and confirm an account will allow any user with knowledge of the deployed URL to upload images/video which has the potential to incur unexpected charges in your AWS account. You can implement a human review of new sign-up requests in cognito by following instructions in the Cognito Developer Guide for Allowing users to sign up in your app but confirming them as a user pool administrator

Deploy and host app
Add a SPA Redirect

SPA Redirect

Follow the instructions to create a redirect for single page web apps (SPA)

Clean up resources

Full Cleanup Instructions
- amplify delete for local build

Usage Instructions

Use the Sign In button to log in. Use the Create Account tab located at the top of the website to sign up for a new user account with your Amazon Cognito integration.
After successfully signing in, choose from the left sidebar to upload an image or video:

File Upload

Click on Choose files Button
Select the images or videos from your local drive
Click on Upload Files

Webcam Upload

Click Allow when your browser asks for permissions to access your webcam
Click Capture Image and Upload Image when you want to upload a single image from your webcam
Click Start Video Capture, Stop Video Capture and finally Upload Video to upload a video from your webcam

Search

Type your prompt in the Search Videos text field. Depending on your input in previous steps you can prompt i.e. “Show me a person with glasses”
Lower the confidence parameter closer to 0, if you see fewer results than you were originally expecting

Tip

The confidence is not a linear scale from 0 to 100. This confidence represents the vector distance between the user's query and the image in the database where 0 represents completely opposite vectors and 100 represents the same vector datapoint.

Solution Walkthrough

Raw Solution Architecture Diagram

AWS Services Used

Amazon Opensearch Serverless
Amazon Bedrock
AWS Lambda
AWS S3
Amazon Cognito
AWS Elemental MediaConvert
AWS Amplify [Deploying and hosting frontend and backend]
Amazon Cloudfront [optional when using cloud hosted front-end]

Manual clip upload process

User manually uploads video clips to S3 bucket (console, CLI or SDK).
S3 Bucket that holds video clips trigger an (s3:ObjectCreated) event for each clip (mp4 or webm) stored in S3.
Lambda function is subscribed to S3 Bucket (s3:ObjectCreated) event and queues up a MediaConvert job to convert the video clip into JPEG images.
Converted images are saved by MediaConvert into an S3 bucket.
S3 Bucket triggers an (s3:ObjectCreated) event for each image (JPEG) stored in S3.
Lambda function is subscribed to the (s3:ObjectCreated) event and generates an embedding using Amazon Titan Multimodal Embeddings, for every new image (JPEG) stored in the S3 Bucket.
Lambda function stores the embeddings in an OpenSearch Serverless index.

Automated video ingestion using Kinesis Video Stream

Alternatively, video clips can be ingested from a video source into a Kinesis Video Data Stream.
Kinesis Video Stream saves the video stream into video clips on the S3 Bucket. This triggers the same above path for steps 2-7.

Website Image Search

Use browses the website.
CloudFront CDN fetches the static web files in S3.
User authenticates and get token from Cognito User Pool.
User makes a search requests to the website, passing the request to the API Gateway.
API Gateway forwards the request to a Lambda Function.
Lambda function passes the search query to Amazon Titan Multimodal Embeddings and converts the request into an embedding.
Lambda function passes the embedding as part of the search, OpenSearch returns matching embeddings and Lambda function returns the matching images to the user.

Website Kinesis Integration

While this solution doesn't create or manage a kinesis video stream, the website does include functionality for displaying a live kinesis video stream and replaying video clips from a kinesis video stream when an image is selected for self-managed kinesis video streams.

You can turn on this functionality by setting the kinesisVideoStreamIntegration parameter in the frontend cloudformation template to True and setting KINESIS_VIDEO_STREAM_INTEGRATION to true in vite.config.js

Suggested minimum changes if used in production environments