ttback / photo-to-recipe

photo to recipe generation with multi-agents

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Photo to Recipe generation with Multi-agents

This project leverages generative AI agents to generate recipes from food images. By utilizing ️LangGraph, various LLM-powered tools and conditional workflows, the application can extract ingredients, retrieve relevant documents, generate recipes, and have self-supervised workflows to correct mistakes and errors in generation.

Demo Video

Watch the demo video

Related Papers

  • Routing: Adaptive RAG (paper). Route questions to different type of retrieval
  • Self-correction: Self-RAG (paper). Fix answers that either contain hallucinations or don't answer the question
  • LLM Critics Help Catch LLM Bugs LLM-Critic (paper). This research trains AI "critics" to assist humans in evaluating code written by other AI models for more accurate evaluations.

Credits and Inspiration

  1. NVIDIA/GenerativeAIExamples
  2. LangGraph_HandlingAgent_IntermediateSteps
  3. Agent_use_tools_leveraging_NVIDIA_AI_endpoints.ipynb
  4. LangChain NVIDIA Integration
  5. Scenario for Image Assets generation
  6. Elevent Labs for Audio in the demo video

How to Run

The project is created with Langchain/Langgraph and can be run with docker compose

To run this project, you only need to use Docker Compose. Follow the steps below to get started.

Prerequisites

  • Nvidia API key is provided through .env file
  • Ensure you have Docker and Docker Compose installed on your machine.

Steps to Run

  1. Clone the Repository:
git clone git@github.com:ttback/photo-to-recipe.git
cd photo-to-recipe
  1. Set up NVIDIA_API_KEY key in .env file, see .env.example

  2. Build and Run the Docker Containers:

docker compose up
  1. Run it in browser: localhost:7860

The images in images folder can be used to test out basic workflow with burger, sushi and non-food photo from the Nvidia example for image caption. The vector db contains burger recipes only, so sushi can be used to test for most complete workflow where the initial RAG-based generation will be rejected and the ADDA team will re-generate recipe with non-RAG based process.

Key Multi-agent Features

  1. Unsupervised Image Type detection: Handle food vs. non-food image without user interaction
  2. Automatic Ingredient Extraction from Food Photo: Using latest multi-modal SLM (microsoft/phi-3-vision-128k-instruct) to extract ingredient from food image
  3. Document Retrieval: Transform online web pages to vector store via langchain and Nvidia's embedding model, NV-Embed-QA
  4. Conditional (RAG or no-RAG) generation: Check whether the retrieved documents are relevant for the recipe generation process, before proceeding with RAG-based generation. If for some reasons, the web urls changed content, or are unavailable, ADDA team is smart enough to avoid RGA-based generation
  5. RAG-based recipe generation: Using retrieved documents, the writer agent will generate recipe.
  6. Automated Hallucation checker: Agents will check whether generated recipe is grounded by documents and is for the food and ingredients detected in the input image.

AI Agents and LLM power tools

Image Role Description Tools
Reader Reads Image Content
  • image_router
  • ingredients_recognizer
  • image_caption
Searcher Searches in Archive(VectorDB)
  • doc_retriever
  • relevance_grader
Writer Writes Recipe
  • rag_recipe_generator
  • recipe_generator
Reviewer Reviews Recipe
  • hallucination_grader
  • answer_grader

Tools

Tool Description Model
image_router Routes the image to the appropriate processing path based on its content. microsoft/phi-3-vision-128k-instruct
ingredients_recognizer Extracts ingredients from the image. microsoft/phi-3-vision-128k-instruct
image_caption Generates a caption for the image. microsoft/phi-3-vision-128k-instruct
doc_retriever Retrieves documents from a vector store based on the question, downloading from food.com. NV-Embed-QA
relevance_grader Grades the relevance of retrieved documents to the question. meta/llama3-70b-instruct
rag_recipe_generator Generates a recipe using RAG on retrieved documents. meta/llama3-70b-instruct
recipe_generator Generates a recipe without using RAG. mistralai/mixtral-8x7b-instruct-v0.1
hallucination_grader Grade for hallucinations in the generated recipe. meta/llama3-70b-instruct
answer_grader Grades the generated recipe against the documents and question. meta/llama3-70b-instruct

Diagram

graph TD
    A[Start] --> B{Is it a food image?}
    B -->|Yes| C[Extract Ingredients]
    B -->|No| D[Image Caption]
    C --> E[Retrieve Recipe Documents]
    E --> F{Are most recipe documents relevant?}
    F -->|Yes| G[Generate Recipe using RAG]
    F -->|No| H[Generate Recipe without RAG]
    G --> I{Is the RAG generation grounded in documents?}
    I -->|Yes| J{Does the RAG generation address the question?}
    I -->|No| H  
    J -->|Yes| K[End]
    J -->|No| H
    D --> L[End]
    H --> K
Loading

About

photo to recipe generation with multi-agents

License:MIT License


Languages

Language:Python 99.1%Language:Dockerfile 0.9%