kyrolabs/sidekick

Connect your SaaS tools to a vector database and keep your data synced

Sidekick is a platform for integrating with SaaS tools like Salesforce, Github, Notion, Zendesk and syncing data between these tools and a vector database. You can also use the integrations and chunkers built by the core team and community to get started quickly, or quickly build new integrations and write custom chunkers for different content types based on Sidekick's DataConnector and DataChunker specs.

Demo

Demo Video with the Zendesk connector. Get an API key to test out the cloud version by creating an account on the Sidekick dashboard.

If you have any questions on how to get started, come join our Slack community!.

💎 Features

Scrape HTML pages and chunk them
Load Markdown files from a Github repo and chunk them
Connect to Weaviate vector store and load chunks
FastAPI endpoints to query vector store directly, or perform Q&A with OpenAI models
Slackbot interface to perform Q&A with OpenAI models

Upcoming

DataConnector and DataChunker abstractions to make it easier to contribute new connectors/chunkers
Connect to Pinecone, Milvus, and Qdrant vector stores

Getting Started - 15 min

To use the cloud version for free:

Create an account on the Sidekick dashboard.
Choose the Connector you want to use and click "Authorize".
After successfully authorizing, click "Connect" to load the data
If the data was successfully loaded, you will see a message showing how many chunks were uploaded
Copy the API key from the "API Keys" page
Click "API Testing" on the dashboard to access a FastAPI docs page where you can try out the different endpoints.
Click "Authorize" in the FastAPI page and paste in your API key as the bearer token.
Click "Try it out" on the /ask-llm endpoint to get an answer to question based on the data that you just ingested. This endpoint performs a semantic search, then uses OpenAI's text-davinci API to summarize the results as an answer to the query. Leave possible_intents empty.
Alternatively, use the /query endpoint to do only a semantic search and return matching chunks from your data.

To run Sidekick locally:

Install Python 3.10, if not already installed.
Clone the repository: git clone https://github.com/ai-sidekick/sidekick.git
Navigate to the sidekick-server directory: cd /path/to/sidekick/sidekick-server
Install poetry: pip install poetry
Create a new virtual environment with Python 3.10: poetry env use python3.10
Activate the virtual environment: poetry shell
Install poetry-dotenv: poetry self add poetry-dotenv
Install app dependencies: poetry install

Set the required environment variables in a .env file in sidekick-server:

DATASTORE=weaviate
BEARER_TOKEN=<your_bearer_token> // Can be any string when running locally. e.g. 22c443d6-0653-43de-9490-450cd4a9836f
OPENAI_API_KEY=<your_openai_api_key>
WEAVIATE_HOST=<Your Weaviate instance host address> // Optional, defaults to http://127.0.0.1
WEAVIATE_PORT=<Your Weaviate port number> // Optional, defaults to 8080. Should be set to 443 for Weaviate Cloud
WEAVIATE_INDEX=<Your chosen Weaviate class/collection name to store your chunks> // e.g. MarkdownChunk

Note that we currently only support weaviate as the data store. You can run Weaviate locally with Docker or set up a sandbox cluster to get a Weaviate host address.

Run the API locally: poetry run start
Access the API documentation at http://0.0.0.0:8000/docs and test the API endpoints (make sure to add your bearer token).

For support and questions, join our Slack community.

API Endpoints

The server is based on FastAPI so you can view the interactive API documentation at <local_host_url i.e. http://0.0.0.0:8000>/docs when you are running it locally.

These are the available API endpoints:

/upsert-web-data: This endpoint takes a url as input, uses Playwright to crawl through the webpage (and any linked webpages), and loads them into the vectorstore.
/query: Endpoint to query the vector database with a string. You can filter by source type (web, markdown, etc.) and set the max number of chunks returned.
/ask-llm: Endpoint to get an answer to a question from an LLM, based on the data in the vectorstore. In the response, you get back the sources used in the answer, the user's intent, and whether or not the question is answerable based on the content in your vectorstore.

Contributing

See CONTRIBUTING.md

Acknowledgments

The boilerplate for this project is based on the ChatGPT Retrieval Plugin
The licensing for this project is inspired by Airbyte's licensing model

About

Open source ETL framework for retrieval augmented generation (RAG). Sync data from your SaaS tools to a vector store, where they can be easily queried by LLM apps

https://www.getsidekick.ai/

GNU Affero General Public License v3.0

Languages

Language:TypeScript 83.6%Language:Python 15.1%Language:CSS 0.7%Language:Dockerfile 0.2%Language:JavaScript 0.2%Language:HTML 0.1%Language:Makefile 0.1%