ranasaurus9 / Multimodal-RAG-Gemini-MongoDB

This is a sample code implementation of Multimodal RAG using Google Gemini & MongoDB Altas Vector Search

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multimodal-RAG-Gemini-MongoDB

Objectives This notebook provides a guide to building a document Q&A engine using multimodal retrieval augmented generation (RAG), step by step:

  1. Extract and store metadata of documents containing both text and images, and generate embeddings the documents
  2. Search the metadata with text queries to find similar text or images
  3. Search the metadata with image queries to find similar images
  4. Using a text query as input, search for contexual answers using both text and images

References

https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/overview

https://www.mongodb.com/products/platform/atlas-vector-search

https://blog.langchain.dev/semi-structured-multi-modal-rag/

https://unstructured-io.github.io/unstructured/introduction.html

About

This is a sample code implementation of Multimodal RAG using Google Gemini & MongoDB Altas Vector Search

License:MIT License


Languages

Language:Jupyter Notebook 100.0%