ruoccofabrizio / azure-open-ai-embeddings-qna

A simple web application for a OpenAI-enabled document search. This repo uses Azure OpenAI Service for creating embeddings vectors from documents. For answering the question of a user, it retrieves the most relevant document and then uses GPT-3, GPT-3.5 or GPT-4 to extract the matching answer for the question.

Home Page:https://azure.microsoft.com/en-us/products/cognitive-services/openai-service

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

embeddings persistence

sbradford006 opened this issue · comments

I'm attempting to have the embeddings in the redis (api) container persist a restart.

having mounted /data to a dir on localhost I only ever see two dirs (/data/redis and /data/redisinsight). Neither of these seem to contain any data...

i've played around with adding --save config in docker-compose but i am no docker wizard and it looks like any config passed at compose time nukes the default config configured in the container.

very possible i am misunderstanding how this should all hang together... but any advice would be welcome!

commented

@sbradford006 you should have redis dump in storage service

Thanks for your input itmilos. Do you mean the storage service for the documents?

Currently documents are stored in Azure blob storage but there is a local (containerizes) redis for embeddings. I don't see anything other than explicitly uploaded documents in the Azure storage unfortunately.

Have i misunderstood your comment?

commented

If you delte this and reset redis you will remove all embeddings

Screenshot 2023-05-31 at 20 12 12

Thanks again for your help - I did have that share on the storage account and have removed it. Currently assuming it was a hangover from a previous iteration of the project where the entire service was built in Azure (before i realized how expensive Redis was going to be!) as that file share hasn't been recreated.

Unfortunately the local Redis instance still appears to lose all embeddings after a container restart and unfortunately simply using the batch operation "convert all files and embeddings" doesn't reproduce them.

image

Any chance you know where the embeddings might be held on the API container, if it isn't /data?