baajarmeh / bens-bites-ai-search

AI search for all the best resources in AI – powered by Ben's Bites 💯

Home Page:https://search.bensbites.co

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ben's Bites

Ben's Bites Link Search

Search across all of the AI-related links in the Ben's Bites newsletter – using AI-powered semantic search.

Build Status MIT License Prettier Code Formatting

Intro

The goal of this app is to provide a highly curated search for staying up-to-date with the latest AI resources and news.

All search results are extracted from Ben's Bites AI Newsletter, which is used as a highly curated data source.

How it works

A cron job is run every 24 hours to update the database.

The steps involved include:

  1. Crawling the source Beehiiv newsletter
  2. Converting each post to markdown
  3. Extracting and resolving unique links
  4. Fetching opengraph metadata for each link
  5. Fetching provider-specific metadata for some links (e.g. tweet text)
  6. Generating vector embeddings for each link using OpenAI
  7. Upserting all links into a Pinecone vector database
  8. Upserting all links into a Meilisearch database

We're using IFramely to extract opengraph metadata for each link, and we also special-case tweet links to extract the tweet text.

Once we have all of the links locally, we upsert them into two databases:

  • A Pinecone vector database for semantic search
  • A Meilisearch database for traditional keyword search

Supporting both of these search indices isn't necessary, but I wanted to have a live comparison of the two approaches in action.

In general, I've found that semantic search is more accurate than keyword search, but keyword search is much faster and can be more intuitive for users.

Semantic Search

Semantic search is powered by OpenAI's `text-embedding-ada-002` embedding model and Pinecone's hosted vector database.

Keyword Search

Traditional keyword-based search is powered by Meilisearch.

TODO

  • better search UX so back button works
  • show the number of posts / links on the home page so it's clear when it was last updated
  • acutally sort by recency instead of faking it
  • set up cron to update the DB daily
  • test on safari/firefox
  • display which newsletter the post first appeared in
  • explore hybrid search
  • infinite scroll so you can keep scrolling results

License

MIT © Travis Fischer

All link data is extracted from Ben's Bites AI Newsletter and is licensed under CC BY-NC-ND 4.0.

If you found this project interesting, please consider sponsoring me or following me on twitter twitter

About

AI search for all the best resources in AI – powered by Ben's Bites 💯

https://search.bensbites.co

License:MIT License


Languages

Language:TypeScript 86.0%Language:CSS 12.8%Language:JavaScript 1.2%Language:Shell 0.0%