Deploy a private ChatGPT alternative hosted within your VPC. Connect it to your organization's knowledge base and use it as a corporate oracle. Supports open-source LLMs like Llama 2, Falcon, and GPT4All.
Retrieval Augmented Generation (RAG) is a technique where the capabilities of a large language model (LLM) are augmented by retrieving information from other systems and inserting them into the LLM’s context window via a prompt. This gives LLMs information beyond what was provided in their training data, which is necessary for almost every enterprise use case. Examples include data from current web pages, data from SaaS apps like Confluence or Salesforce, and data from documents like sales contracts and PDFs.
RAG works better than fine-tuning the model because it’s cheaper, it’s faster, and it’s more reliable since the source of information is provided with each response.
RAGstack deploys the following resources for retrieval-augmented generation:
-
GPT4All: When you run locally, RAGstack will download and deploy Nomic AI's gpt4all model, which runs on consumer CPUs.
-
Falcon-7b: On the cloud, RAGstack deploys Technology Innovation Institute's falcon-7b model onto a GPU-enabled GKE cluster.
-
LLama 2: On the cloud, RAGstack can also deploy the 7B paramter version of Meta's Llama 2 model onto a GPU-enabled GKE cluster.
- Qdrant: Qdrant is an open-source vector database written in Rust, so it's highly performant and self-hostable.
Simple server and UI that handles PDF upload, so that you can chat over your PDFs using Qdrant and the open-source LLM of choice.
- Copy
ragstack-ui/local.env
intoragstack-ui/.env
- Copy
server/example.env
intoserver/.env
- In
server/.env
replaceYOUR_SUPABASE_URL
with your supabase project url andYOUR_SUPABASE_KEY
with your supabase secret API key. Inragstack-ui/.env
replaceYOUR_SUPABASE_URL
with your supabase project url andYOUR_SUPABASE_PUBLIC_KEY
with your supabase secret API key. You can find these values in your supabase dashboard under Settings > API - In Supabase, create a table
ragstack_users
with the following columns:Column name Type id uuid app_id uuid secret_key uuid email text avatar_url text full_name text
If you added row level security, make sure that inserts and selects have a WITH CHECK
expression of (auth.uid() = id)
.
- Run
scripts/local/run-dev
. This will download ggml-gpt4all-j-v1.3-groovy.bin intoserver/llm/local/
and run the server, LLM, and Qdrant vector database locally.
All services will be ready once you see the following message:
INFO: Application startup complete.
To deploy the RAG stack using Falcon-7B
running on GPUs to your own google cloud instance, go through the following steps:
- Run
scripts/gcp/deploy-gcp.sh
. This will prompt you for your GCP project ID, service account key file, and region as well as some other parameters (model, HuggingFace token etc). - If you get an error on the
Falcon-7B
deployment step, run the following commands and then runscripts/gcp/deploy-gcp.sh
again:
gcloud config set compute/zone YOUR-REGION-HERE
gcloud container clusters get-credentials gpu-cluster
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
The deployment script was implemented using Terraform.
- You can run the frontend by creating a
.env
file inragstack-ui
and settingVITE_SERVER_URL
to the url of theragstack-server
instance in your Google Cloud run.
To deploy the RAG stack using Falcon-7B
running on GPUs to your own AWS EC2 instances (using ECS), go through the following steps:
- Run
scripts/aws/deploy-aws.sh
. This will prompt you for your AWS credentials as well as some other parameters (model, HuggingFace token etc).
The deployment script was implemented using Terraform.
- You can run the frontend by creating a
.env
file inragstack-ui
and settingVITE_SERVER_URL
to the url of the ALB instance.
To deploy the RAG stack using Falcon-7B
running on GPUs to your own AKS, go through the following steps:
- Run
./azure/deploy-aks.sh
. This will prompt you for your AKS subscription as well as some other parameters (model, HuggingFace token etc).
The deployment script was implemented using Terraform.
- You can run the frontend by creating a
.env
file inragstack-ui
and settingVITE_SERVER_URL
to the url of theragstack-server
service in your AKS.
Please note that this AKS deployment is using node pool with NVIDIA Tesla T4 Accelerator which is not in all subscriptions available
- ✅ GPT4all support
- ✅ Falcon-7b support
- ✅ Deployment on GCP
- ✅ Deployment on AWS
- ✅ Deployment on Azure
- 🚧 Llama-2-40b support
The code for containerizing Falcon 7B is from Het Trivedi's tutorial repo. Check out his Medium article on how to dockerize Falcon here!