Connect your SaaS tools to a vector database and keep your data synced
Sidekick is a platform for integrating with SaaS tools like Salesforce, Github, Notion, Zendesk and syncing data between these tools and a vector database. You can also use the integrations and chunkers built by the core team and community to get started quickly, or quickly build new integrations and write custom chunkers for different content types based on Sidekick's DataConnector
and DataChunker
specs.
Demo Video with the Zendesk connector. Get an API key to test out the cloud version by creating an account on the Sidekick dashboard.
If you have any questions on how to get started, come join our Slack community!.
- Scrape HTML pages and chunk them
- Load Markdown files from a Github repo and chunk them
- Connect to Weaviate vector store and load chunks
- FastAPI endpoints to query vector store directly, or perform Q&A with OpenAI models
- Slackbot interface to perform Q&A with OpenAI models
DataConnector
andDataChunker
abstractions to make it easier to contribute new connectors/chunkers- Connect to Pinecone, Milvus, and Qdrant vector stores
To use the cloud version for free:
- Create an account on the Sidekick dashboard.
- Choose the Connector you want to use and click "Authorize".
- After successfully authorizing, click "Connect" to load the data
- If the data was successfully loaded, you will see a message showing how many chunks were uploaded
- Copy the API key from the "API Keys" page
- Click "API Testing" on the dashboard to access a FastAPI docs page where you can try out the different endpoints.
- Click "Authorize" in the FastAPI page and paste in your API key as the bearer token.
- Click "Try it out" on the
/ask-llm
endpoint to get an answer to question based on the data that you just ingested. This endpoint performs a semantic search, then uses OpenAI's text-davinci API to summarize the results as an answer to the query. Leavepossible_intents
empty. - Alternatively, use the
/query
endpoint to do only a semantic search and return matching chunks from your data.
To run Sidekick locally:
-
Install Python 3.10, if not already installed.
-
Clone the repository:
git clone https://github.com/ai-sidekick/sidekick.git
-
Navigate to the
sidekick-server
directory:cd /path/to/sidekick/sidekick-server
-
Install poetry:
pip install poetry
-
Create a new virtual environment with Python 3.10:
poetry env use python3.10
-
Activate the virtual environment:
poetry shell
-
Install
poetry-dotenv
:poetry self add poetry-dotenv
-
Install app dependencies:
poetry install
-
Set the required environment variables in a
.env
file insidekick-server
:DATASTORE=weaviate BEARER_TOKEN=<your_bearer_token> // Can be any string when running locally. e.g. 22c443d6-0653-43de-9490-450cd4a9836f OPENAI_API_KEY=<your_openai_api_key> WEAVIATE_HOST=<Your Weaviate instance host address> // Optional, defaults to http://127.0.0.1 WEAVIATE_PORT=<Your Weaviate port number> // Optional, defaults to 8080. Should be set to 443 for Weaviate Cloud WEAVIATE_INDEX=<Your chosen Weaviate class/collection name to store your chunks> // e.g. MarkdownChunk
Note that we currently only support weaviate as the data store. You can run Weaviate locally with Docker or set up a sandbox cluster to get a Weaviate host address.
-
Run the API locally:
poetry run start
-
Access the API documentation at
http://0.0.0.0:8000/docs
and test the API endpoints (make sure to add your bearer token).
For support and questions, join our Slack community.
The server is based on FastAPI so you can view the interactive API documentation at <local_host_url i.e. http://0.0.0.0:8000>/docs
when you are running it locally.
These are the available API endpoints:
-
/upsert-web-data
: This endpoint takes aurl
as input, uses Playwright to crawl through the webpage (and any linked webpages), and loads them into the vectorstore. -
/query
: Endpoint to query the vector database with a string. You can filter by source type (web, markdown, etc.) and set the max number of chunks returned. -
/ask-llm
: Endpoint to get an answer to a question from an LLM, based on the data in the vectorstore. In the response, you get back the sources used in the answer, the user's intent, and whether or not the question is answerable based on the content in your vectorstore.
See CONTRIBUTING.md
- The boilerplate for this project is based on the ChatGPT Retrieval Plugin
- The licensing for this project is inspired by Airbyte's licensing model