An open-source initiave to document and share experiments to apply Retrieval Augmented Generation (RAG) techniques to Threat Intelligence searching capabilities.
docker build . -t rag-chroma
Create a .env
file and define the OPENAI_API_KEY
variable with your OpenAI Key. This is needed to use the LangChain's ChatOpenAI module. This is not needed to embed the ATT&CK Groups data. This is done with the all-mpnet-base-v2 sentence-transformers model ;) .
OPENAI_API_KEY=XXXXXXXXX
docker run -it --rm --name rag-chroma --env-file .env -p 8080:8080 rag-chroma
After running that command, the container will:
- Download the all-mpnet-base-v2 sentence-transformers model (~400MB).
- Download the Hugging Face Cyb3rWard0g/ATTCKGroups dataset (~846KB).
- Process the dataset by tokenizing and embedding every ATT&CK Group.
- Create the vector database by adding all the embeddings into a local Chroma Database.
Browse to http://127.0.0.1/rag-chroma/playground
and start asking questions.