Intel Retrieval Augmented Generation (RAG) Utilities

An open-source initiave to document and share experiments to apply Retrieval Augmented Generation (RAG) techniques to Threat Intelligence searching capabilities.

Build Docker Image

docker build . -t rag-chroma

Define .ENV File

Create a .env file and define the OPENAI_API_KEY variable with your OpenAI Key. This is needed to use the LangChain's ChatOpenAI module. This is not needed to embed the ATT&CK Groups data. This is done with the all-mpnet-base-v2 sentence-transformers model ;) .

OPENAI_API_KEY=XXXXXXXXX

Run Docker Image

docker run -it --rm --name rag-chroma --env-file .env -p 8080:8080 rag-chroma

After running that command, the container will:

Download the all-mpnet-base-v2 sentence-transformers model (~400MB).
Download the Hugging Face Cyb3rWard0g/ATTCKGroups dataset (~846KB).
Process the dataset by tokenizing and embedding every ATT&CK Group.
Create the vector database by adding all the embeddings into a local Chroma Database.

Explore Playground

Browse to http://127.0.0.1/rag-chroma/playground and start asking questions.

References

About

Intel Retrieval Augmented Generation (RAG) Utilities

Languages

Language:Jupyter Notebook 86.0%Language:Python 12.3%Language:Dockerfile 1.7%