kamran-redis/rag

Sample notebok from LangChain: Chat with Your Data using Redis as vectorstore for RAG and local LLM and embeddings.

Other than the python 3.10, jupyter you will need.
- Redis Stack docker run -it --name redis-stack -p 6379:6379 -p 8001:8001 --rm redis/redis-stack:7.2.0-v0
- LLM model
- Inference Engine
If you are new to Jupyter etc some commands below

  python3.10 -m venv venv
	. ./venv/bin/activate
  pip install notebook
  jupyter notebook

OR use frozen versions

  python3.10 -m venv venv
	. ./venv/bin/activate
  pip install -r requirements.txt
  jupyter notebook

LLM Model

Meta have releases Llama-2 model. Running the models require GPU and memory and APPL M1/M2 for consumers is the most convenient option to run a local model. You can download the models from HuggingFace. I used the follwong models.

Model	Notes
llama-2-7b-chat.Q6_K.gguf	If you do not have a powerful GPU or 8GB RAM (M1 Max performance ~30 tokens/second). This is the smallest llama-2 model
llama-2-13b-chat.Q6_K.gguf	16GB RAM and GPU (M1 Max performance ~16 tokens/second)

Inference Engine

llama2.cpp. The inference engine used by every other tool. To Run as open api server see server example. I used the following commands to compile and run the server

LLAMA_METAL=1 make

# Run the server
./server  -m ~/.cache/lm-studio/models/thebloke/Llama-2-70B-Chat-GGML/llama-2-70b-chat.ggmlv3.q4_K_S.bin -ngl 1 -gqa 8   --ctx-size 4000  -v


#In a separte temrinal run the open api server
python3.10 -m venv venv
. ./venv/bin/activate
pip install flask
pip install requests
python examples/server/api_like_OAI.py

# and you can test by
 curl http://localhost:8081/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "messages": [
    {  "role": "user", "content":"Tell me about the capabilities of redis enterprise and when to use it" }
  ],
  "temperature": 0.2,
  "max_tokens": 100,
  "stream": false
}'

kamran-redis / rag

OR use frozen versions

LLM Model

Inference Engine

About

Languages