This workshop
- is an introduction to the Python ecosystem of embeddings, vector databases.
- demonstrates how to build a search app
- is designed to be delivered in-person.
- not a deep dive in to all the technologies involved.
Visit https://bit.ly/techspace-llm-workshop for the workshop material on Google colab notebook.
I recommend to keep the workshop contained in a conda environment, if you can.
A machine or an environment (Google Colab or Kaggle etc.) that supports:
- Python
- LangChain
- We will use Chroma, an open-source and lightweight embedding database.
- Pandas, for data transformations
- SQLite, SQLite browser to view the records.
- FastAPI, uvicorn
- Create an API Key with OpenAI. Sign-up and create an API key.
- What are embeddings?
- What are vector databases?
- What is Retrieval Augmented Generation (RAG)?
- Building a search engine
- Select textbook (will be preselected)
- Create chunks from the book pages using LangChain text splitter utilities
- Embed chunks in Chroma
- Build a query service
- RAG to summarize the user question
- Host with FastAPI, if time permits
- Troubleshooting
- Q&A, Discussion
- Appendix
- Tooling
- Python Ecosystem
Skeleton utilities for all these will be provided for the workshop.
TBD
Email me with any questions: bhanu@collab.place