Multimodal-RAG-Gemini-MongoDB

Objectives This notebook provides a guide to building a document Q&A engine using multimodal retrieval augmented generation (RAG), step by step:

Extract and store metadata of documents containing both text and images, and generate embeddings the documents
Search the metadata with text queries to find similar text or images
Search the metadata with image queries to find similar images
Using a text query as input, search for contexual answers using both text and images

References

About

This is a sample code implementation of Multimodal RAG using Google Gemini & MongoDB Altas Vector Search

MIT License

Language:Jupyter Notebook 100.0%