Mercury

A Chat GPT Embedding Template - inspired by gannonh

This template gives you two sandboxes to explore the openAI chat api:

1. Domain-Specific - [perplexity clone]

Perplexity style sandbbox
Trained on specific websites that you define
Sites sources

2. Chat - [chatGPT clone]

Conversational chatGPT sandbox with in chat memory
Markdown Renderer built in for code snippets

3. Domain-Specific File Chat - [perplexity clone]

Coming soon

Domain-Specific Overview: `/pages/embed`

ChatGPT is a great tool for answering general questions, but it falls short when it comes to answering domain-specific questions as it often makes up answers to fill its knowledge gaps and doesn't cite sources. To solve this issue, this starter app uses embeddings coupled with vector search. This app shows how OpenAI's GPT-3 API can be used to create conversational interfaces for domain-specific knowledge.

Embeddings are vectors of floating-point numbers that represent the "relatedness" of text strings. They are very useful for tasks like ranking search results, clustering, and classification. In text embeddings, a high cosine similarity between two embedding vectors indicates that the corresponding text strings are highly related.

This app uses embeddings to generate a vector representation of a document and then uses vector search to find the most similar documents to the query. The results of the vector search are then used to construct a prompt for GPT-3, which generates a response. The response is then streamed back to the user.

Domain-Specific Details: `/pages/embed`

[model gpt-3.5-turbo]

1. Creating and storing the embeddings: `/api/generate-embeddings`

Web pages are scraped using cheerio, cleaned to plain text, and split into 1000-character documents.
OpenAI's embedding API is used to generate embeddings for each document using the "text-embedding-ada-002" model.
The embeddings are stored in a Supabase postgres table using pgvector. The table has three columns: the document text, the source URL, and the embedding vectors returned from the OpenAI API.

2. Responding to queries: `/api/get-embeddings`

A single embedding is generated from the user prompt.
The embedding is used to perform a similarity search against the vector database.
The results of the similarity search are used to construct a prompt for GPT-3.
The GTP-3 response is then streamed back to the user.

Chat Overview: `/pages/chat`

[model gpt-3.5-turbo]

The OpenAI API chat feature uses a machine learning model to generate responses to user input. It can be fine-tuned on specific datasets and scenarios to create chatbots that provide contextually-relevant and effective responses.

OpenAI API (ChatGPT) - streaming /api/chat

Template Features

OpenAI API (for generating embeddings and GPT-3 responses)
Supabase (using their pgvector implementation as the vector database)
Nextjs API Routes (Edge runtime) - streaming
Tailwind CSS
Fonts with @next/font
Icons from Lucide
Dark mode with next-themes
Radix UI Primitives
Automatic import sorting with @ianvs/prettier-plugin-sort-imports

Getting Started

🍴 Huge thanks to @gannonh most of the scraping and embedding logic came from his gpt3.5-turbo-pgvector repo

Set-up Pinecone

Visit pinecone to create and retrieve your API keys, and also retrieve your environment and index name from the dashboard.

Set-up Supabase

Create a Supabase account and project at https://app.supabase.com/sign-in.
First we'll enable the Vector extension. In Supabase, this can be done from the web portal through Database → Extensions. You can also do this in SQL by running:

create extension vector;

Next let's create a table to store our documents and their embeddings. Head over to the SQL Editor and run the following query:

create table documents (
  id bigserial primary key,
  content text,
  url text,
  embedding vector (1536)
);

Finally, we'll create a function that will be used to perform similarity searches. Head over to the SQL Editor and run the following query:

create or replace function match_documents (
  query_embedding vector(1536),
  similarity_threshold float,
  match_count int
)
returns table (
  id bigint,
  content text,
  url text,
  similarity float
)
language plpgsql
as $$
begin
  return query
  select
    documents.id,
    documents.content,
    documents.url,
    1 - (documents.embedding <=> query_embedding) as similarity
  from documents
  where 1 - (documents.embedding <=> query_embedding) > similarity_threshold
  order by documents.embedding <=> query_embedding
  limit match_count;
end;
$$;

Set-up local environment

To create a new project based on this template using degit:

npx degit https://github.com/Jordan-Gilliam/ai-template ai-template

cd ai-template
code .

install dependencies

npm i

create a .env.local file in the root directory to store environment variables:

cp .env.example .env.local

open the .env.local file and add your Supabase project URL and API key.

You can find these in the Supabase web portal under Project → API. The API key should be stored in the SUPABASE_ANON_KEY variable and project URL should be stored under NEXT_PUBLIC_SUPABASE_URL.
Add your OPENAI PI key to .env.local. You can find this in the OpenAI web portal under API Keys. The API key should be stored in the OPENAI_API_KEY variable.
Start the app

npm run dev

Open http://localhost:3000 in your browser to view the app.

yanqic / gpt-qa

Mercury

This template gives you two sandboxes to explore the openAI chat api:

1. Domain-Specific - [perplexity clone]

2. Chat - [chatGPT clone]

3. Domain-Specific File Chat - [perplexity clone]

Domain-Specific Overview: `/pages/embed`

Domain-Specific Details: `/pages/embed`

1. Creating and storing the embeddings: `/api/generate-embeddings`

2. Responding to queries: `/api/get-embeddings`

Chat Overview: `/pages/chat`

Template Features

Getting Started

Set-up Pinecone

Set-up Supabase

Set-up local environment

Inspiration:

About

Languages

Mercury

This template gives you two sandboxes to explore the openAI chat api:

1. Domain-Specific - [perplexity clone]

2. Chat - [chatGPT clone]

3. Domain-Specific File Chat - [perplexity clone]

Domain-Specific Overview: /pages/embed

Domain-Specific Details: /pages/embed

1. Creating and storing the embeddings: /api/generate-embeddings

2. Responding to queries: /api/get-embeddings

Chat Overview: /pages/chat

Template Features

Getting Started

Set-up Pinecone

Set-up Supabase

Set-up local environment

Inspiration:

About

Languages

Domain-Specific Overview: `/pages/embed`

Domain-Specific Details: `/pages/embed`

1. Creating and storing the embeddings: `/api/generate-embeddings`

2. Responding to queries: `/api/get-embeddings`

Chat Overview: `/pages/chat`