This is a proof of concept repo for vectorstore loader orchestration Ⓒ. It combines LangChain with Dagster.
- Install dependencies:
pip install -r requirements.txt
- Set OPENAI_API_KEY environment variable
export OPENAI_API_KEY={YOUR_API_KEY}
- Run
dagster dev -f run.py
and navigate to http://127.0.0.1:3000/ to view the dag jobs you have created. - run.py's default is to store the output to a local vectorstore .pkl file. See alternative storage options bellow.
For testing I have set up two dags in the dag folder that leverage common loading logic. To add a new loader run simply create a new file with a dagster job and add to the run.py folder. i. You can use any Langchain Document Loaders to load your own data into a vectorstore.
To run with Pinecone Vector Database you will need to:
- Sign up for an account here: Pinecone
- Navigate to API Key page and set two environment variables
export PINECONE_API_KEY={YOUR_API_KEY}
andexport PINECONE_ENVIRONMENT={YOUR_ENVIRONMENT}
- Uncomment the Pinecone def in the run.py file and comment the previous def.
- Run
dagster dev -f run.py
and navigate to http://127.0.0.1:3000/ to view the dag jobs you have created.
This can be used to replace the manual run of ingest.sh
in ChatLangChain to provide scheduled updates of vectorstores as well as handling and managing inputs from various sources.