Cisco Documentation RAG LLM

LLM Documentation Demo for Nexus Dashboard

A demo application showcasing how to run a local LLM on your own hardware. Includes samples that leverage open-source libraries (llama.cpp) and models (llama), as well as documentation from Nexus Dashboard.

Installation

First clone the project and navigate into project directory

git clone https://github.com/ndavidson19/ciscolive.git
cd ciscolive/ciscolive-demo/documentation-llm

Next you must download the modelfile. Huggingface has so many models to choose from and all have very elaborate names. We will be choosing a DPO finetuned version of StableLM. This small 3B model punches above its weight when it comes to RAG applications.

Rocket 3B

Next create a directory called llm in the backend folder

cd /cisco-live/documentation-llm/backend
mkdir llm

Then move the modelfile to the correct directory ciscolive/ciscolive-demo/documentation-llm/backend/llm/dolphin-2.6-mistral-7b-dpo-laser.Q4_K_M.gguf

cd /cisco-live/documentation-llm/backend
mkdir llm

Usage

This entire application has been dockerized and can be run with just

docker-compose up --build

This starts three different services.

The Vector Datastore (pgvector)
- This pulls a postgres image from ankane/pgvector that installs the correct extensions for allow vectors within postgres.
The Flask serving APIs and VectorDB insertion
- This service starts a flask API endpoint route (/get_message) on port :5000 that allows for a user to send queries to the LLM being served using LlamaCPP (https://github.com/abetlen/llama-cpp-python) using script at /backend/main.py
- This service also parses the pdf living in /training/pdfs/ using /training/pdf.py and then inserts it into the database using /training/db-embeddings.py
The UI service
- Uses nginx to start a basic webserver for the basic index.html file

Note: This is a very simplistic scaled down version of our full architecture we are running in production and should be treated as a starting point. Look into the llama-cpp-python OpenAI compatible webserver if you are going to be creating your own application.

Manual Usage

It is recommended to create a virtual-env before installing dependencies. Or use a dependency manager such as anaconda. Ex.

python3 -m venv venv_name
source venv_name/bin/activate

pip install -r requirements.txt

Next you must download the modelfile. Rocket 3B

Next move the modelfile to the correct directory /cisco-live/documentation-llm/backend/llm/llama-2-7b-chat.Q4_K_M.gguf

cd /cisco-live/documentation-llm/backend
mkdir llm

Deployment Scripts

Database Setup:

Run the PostgreSQL vector extension for embeddings:

docker pull ankane/pgvector
docker run -p 5432:5432 -e POSTGRES_PASSWORD=secret -e POSTGRES_USER=postgres ankane/pgvector

Training Pipeline:
- Navigate to the training directory.
- Run pdf.py to parse PDFs and db-embeddings.py to store embeddings:
```
python pdf.py
python db-embeddings.py
```
Start the Backend:
- Use llama-cpp-python OpenAI compatible webserver for managing model serving.
```
python3 -m llama_cpp.server --config_file /<USER_PATH>/documentation-llm/backend/llm/config.json
```
- Start the backend services located in backend/inference:
```
python main.py
```

Load UI (html)

Run the below command in the root directory of the project.

python -m http.server

Navigate to http://localhost:8000/ in your browser. To load the UI you just need to open the index.html file that lives in the cisco-live/documentation-llm/ui directory.

You should be all set to start asking questions!

Licensing info

A license is required for others to be able to use your code. An open source license is more than just a usage license, it is license to contribute and collaborate on code. Open sourcing code and contributing it to Code Exchange requires a commitment to maintain the code and help the community use and contribute to the code. More about open-source licenses

isheriff123 / ciscolive