LLM retrieval augmentation in Google Cloud

This demo features GCP Matching Engine and VertexAI PaLM to combine the functionality of retrieval augmentation and conversational engines to create a question answering system where the user can ask a question and the LLM will use it's given context to answer the question.

The Dataset used is the Stanford Question Answering Dataset (SQuAD) , a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles.

The demo can be accessed here.

Services used

VertexAI Matching Engine: ANN Similarity Seach
VertexAI PaLM: Conversational Engine
Cloud Run: Hosting of the API
Firestore: Document Database
Firebase: Frontend hosting
Cloud Build: CI/CD

Frameworks:

LangChain: Framework for creating conversational agent and retrieval augmentation
Tensorflow Hub: Embeddings

Prerequisites

Terraform
A GCP project created

Docs

Infrastructure and Matching Engine Setup: Setup the required infrastructure using Terraform and create the Matching Engine index
Create embeddings: Generate the embeddings for the documents and index them in Matching Engine
Firestore: Index the documents in Firestore
LangChain Retriever and Agent: Create a LangChain retriever and conversational agent
Cloud Run: Grab all the code, package it and deploy the API to Cloud Run
Firebase WebUI: Create the Web app

About

A retrieval augmentation LLM demo in GCP

https://llmops-demos-frg.web.app

Languages

Language:Jupyter Notebook 68.3%Language:Svelte 14.7%Language:HCL 7.3%Language:Python 6.4%Language:JavaScript 1.1%Language:HTML 0.7%Language:CSS 0.5%Language:TypeScript 0.5%Language:Dockerfile 0.4%