Naval GPT

AI-powered search & chat for Naval Ravikant's Twitter thread "How To Get Rich."

(adding more content soon)

Everything is 100% open source.

Dataset

The dataset consists of 2 CSV files containing all text & embeddings used.

Download clips data here.

Download passages data here.

I recommend getting familiar with fetching, cleaning, and storing data as outlined in the scraping and embedding scripts below, but feel free to skip those steps and just use the dataset.

How It Works

Naval GPT provides 3 things:

Search
Chat
Audio

Search

Search was created with OpenAI Embeddings (text-embedding-ada-002).

First, we loop over the passages from Naval's formatted blog post and generate embeddings for each chunk of text.

We do this because we can render the beautifully formatted text in our app by saving the HTML.

In the app, we take the user's search query, generate an embedding, and use the result to find the most similar passages.

The comparison is done using cosine similarity across our database of vectors.

Our database is a Postgres database with the pgvector extension hosted on Supabase.

Results are ranked by similarity score and returned to the user.

Chat

Chat builds on top of search. It uses search results to create a prompt that is fed into GPT-3.5-turbo.

This allows for a chat-like experience where the user can ask questions about the topic and get answers.

Audio

The podcast player is a simple audio player that plays the podcast for this thread.

We use Python and OpenAI Whisper to loop over the podcast to generate embeddings for each 1min chunk of audio.

We then use the same method as search to find the most similar clip.

During our audio processing we saved timestamps for each clip, so we then jump to that timestamp for the podcast in the app.

Running Locally

Here's a quick overview of how to run it locally.

Requirements

Set up OpenAI

You'll need an OpenAI API key to generate embeddings.

Set up Supabase and create a database

Note: You don't have to use Supabase. Use whatever method you prefer to store your data. But I like Supabase and think it's easy to use.

There is a schema.sql file in the root of the repo that you can use to set up the database.

Run that in the SQL editor in Supabase as directed.

I recommend turning on Row Level Security and setting up a service role to use with the app.

Repo Setup

Clone repo

git clone https://github.com/mckaywrigley/naval-gpt.git

Install dependencies

npm i

Set up environment variables

Create a .env.local file in the root of the repo with the following variables:

OPENAI_API_KEY=

NEXT_PUBLIC_SUPABASE_URL=
SUPABASE_SERVICE_ROLE_KEY=

You'll also need to save your OpenAI API key as an environment variable in your OS.

export OPENAI_API_KEY=

Process Text

Run text scraping script

npm run scrape

This scrapes the content from Naval's website and saves it to a json file.

Run text embedding script

npm run embed-text

This reads the json file, generates embeddings for each passage, and saves the results to your database.

There is a 200ms delay between each request to avoid rate limiting.

This process will take 10-15 minutes.

Process Audio

Download podcast

Download the podcast and add it as "podcast.mp3" to the public directory.

Run the audio processing script

Note: You'll need to have Python installed on your machine.

cd scripts

python3 main.py

This splits the podcast into 1min chunks and generates embeddings for each chunk.

The results are saved to a json file.

There is a 1.2s delay between each request to avoid rate limiting.

It will take 20-30 minutes to run.

Run audio embedding script

npm run embed-audio

This reads the json file, generates embeddings for each clip, and saves the results to your database.

There is a 200ms delay between each request to avoid rate limiting.

This process will take about 5 minutes.

App

Run app

npm run dev

Credits

Thanks to Naval Ravikant for publicizing his thoughts - they've proven to be an invaluable source of wisdom for all of us.

Contact

If you have any questions, feel free to reach out to me on Twitter!

Notes

I sacrificed composability for simplicity in the app.

You can split up a lot of the stuff in index.tsx into separate components.

mckaywrigley / naval-gpt